Statistics Without Tears by Stan Brown

5/31/2021 Stats without Tears / SWT
STATS
WITHOUT
TEARS
Copyright © 2001–2020 by Stan Brown,

Tompkins Cortland Community College
Updated 17 Nov 2020
https://brownmath.com/swt/pfswt.htm 1/294
Contents
Help » About This Book

1. Statistics!
1A. Statistics? What’s That?
1B. Good Samples, Bad Samples
1C. Data and Variables
1D. Statistical Errors
1E. Observation and Experiment
1F. Sharp Points
What Have You Learned?
Exercises for Chapter 1
2. How to Graph Your Data
2A. Non-Numeric Data
2B. Numeric Data
2C. Bad Graphs
2D. Really Good Graphs
3. Numbers about Numbers
3A. Measures of Center
3B. Summary Numbers on the TI-83 …
3C. Measures of Spread
3D. Measures of Position
3E. Five-Number Summary
4. Linked Variables
4A. Mathematical Models
4B. Sca erplot, Correlation, and Regression on TI-83/84
4C. How to Find ŷ from a Regression on TI-83/84
4D. Decision Points for Correlation Coefficient
4E. Optional: Sca erplot, Correlation, and Regression in Excel
5. Probability
5A. Probability Basics
5B. Combining Probabilities
5C. Sequences instead of Formulas
6. Discrete Probability Models
6A. Random Variables
6B. Discrete Probability Distributions
6C. Bernoulli Trials
6D. The Geometric Model
6E. The Binomial Model
7. Normal Distributions
7A. Continuous Random Variables
7B. The Normal Model
7C. The Standard Normal Distribution
7D. Checking for Normality
8. How Samples Vary
8A. Numeric Data / Means of Samples
8B. Binomial Data / Proportions of Samples
8C. Summary of Sampling Distributions
9. Estimating Population Parameters
9A. Estimating Population Proportion p
9B. Estimating Population Mean µ When You Know σ
9C. Estimating Population Mean µ When You Don’t Know σ
10. Hypothesis Tests
10A. Testing a Proportion (Binomial Data)
10B. Sharp Points
10C. Testing a Mean (Numeric Data)
10D. Confidence Interval and Hypothesis Test
10E. Testing a Non-Random Sample
11. Inference from Two Samples
11A. Numeric Data — Paired or Unpaired?
11B. Inference with Paired Numeric Data (Case 3)
11C. Inference with Unpaired Numeric Data (Case 4)
11D. Inference on Two Proportions (Case 5)
11E. Confidence Interval and Hypothesis Test (Two Populations)
11F. More Confidence Intervals for Two Populations
12. Tests on Counted Data
12A. Testing Goodness of Fit to a Model
12B. Testing for Independence or Homogeneity
12C. But Wait, There’s More!
Review
What’s Important?
Review Problems
Solutions to All Exercises
Solutions for Chapter 1
Solutions to Review Problems
Reference Material
Statistics Symbol Sheet
Roman Le ers
Greek Le ers
Inferential Statistics: Basic Cases
Seven Steps of Hypothesis Tests
Big Names in Statistics
Recommended Statistics Books
Statistics for Citizens
Textbooks
TI-83/84 Cheat Sheet
Sampling
Statistics of a Sample or Parameters of a Population
Correlation and Regression
Discrete Probability Distributions
Normal Distribution
Confidence Intervals, Hypothesis Tests, Sample Size
TI-83/84 Troubleshooting
Error Messages
List Troubles
Graphing Troubles
Other Troubles
Sources Used
Help » About This Book
This book is an alternative to the usual textbooks for a one-semester course in statistics. Whether you’re teaching in a classroom or learning on your
own, you’ve come to the right place.
DON’T Douglas Adams’ The Hitchhiker’s Guide to the Galaxy bore a “large, friendly label” with those words, and that’s also my message to you.
PANIC! I don’t see any reason for students to be afraid of statistics. It’s no more difficult than any other technical course, and it’s much
more practical than other math courses. The mathematical details are here for those who want them, but I lean heavily on technology
to relieve students of the “grunt work”.
Calculator: You need a TI-83 or TI-84 family calculator to get the most out of this book. For $100 or less, this calculator has amazing capabilities for
statistics, and it also supports other math courses up through calculus. I suggest you download my free MATH200A program [URL:
h ps://BrownMath.com/ti83/math200a.htm], which adds some capabilities to the calculator, but this is optional.
Some error conditions on your calculator can be scary when you see them the first time. Don’t panic! See TI-83/84 Troubleshooting
[URL: h ps://BrownMath.com/swt/pfswt.htm.htm#tsti_top].
View or These pages change automatically for your screen or printer. Underlined text, printed URLs, and the table of contents become live links on screen;
Print: and you can use your browser’s commands to change the size of the text or search for key words.
History of This textbook grew out of handouts I made for my students at TC3 (Tompkins Cortland Community College in Dryden, New York).
this book: The handouts filled gaps and corrected errors in our standard textbook.
As time went on, I found myself replacing whole chapters. Student evaluations showed that they preferred these replacements to
the textbook. In Spring 2013 I reached the tipping point: I had replaced more than half of the twelve textbook chapters. In good
conscience I didn’t feel I could ask students to buy an expensive textbook that they would use less than half of, so I burned my bridges
and announced the required textbook beginning in Summer 2013 as “none”.
In Fall 2013, a second instructor at TC3 adopted this textbook for his class. Benjamin Kirk provided a lot of valuable suggestions
and corrections, and I’m very grateful. They have improved the book considerably.
Feedback Contact information is at BrownMath.com/about/#Contact.

welcome! Please share your reactions, whether positive or negative! If I could explain something be er, I’d like to know. If some section
works particularly well for you, please tell me. If you find an error, I especially want to know about it. (My own students get extra
credit for pointing out errors.)
Being on the Web, this book will get updated frequently, based on your feedback. You can see the revision dates in the chapter list at
h ps://BrownMath.com/swt/pfswt.htm .
Students: This eTextbook is a free resource for you. You can read it on line or print any or all
chapters. Links to all the chapters are at <h ps://BrownMath.com/swt/pfswt.htm>. If Because this textbook helps you,
you print any chapters, you can keep your costs down by choosing black-and-white please donate at
printing in duplex (two-sided) mode. BrownMath.com/donate.
Just a word of advice. I’ve tried to make statistics approachable to anyone with
high-school math, but it’s still a technical subject. You can’t just read a chapter in one
pass from start to end, the way you would a novel or a book of history. Please see How to Read a Math Book [URL:
h ps://BrownMath.com/stfa/read.htm] for some tips on ge ing the most out of your time with this book, or any math book.
Some material is marked BTW. This is stuff I find interesting, including mathematical details that some students have asked for,
but you can get through the course without it.
Instructors: Although this is a free resource, it is copyrighted and I would appreciate your asking permission to copy and distribute any of it. My
contact information is at BrownMath.com/about/#Contact.
Though you don’t need to ask permission simply to link to this material, I would appreciate knowing about it.
1. Statistics!
Updated 21 Feb 2016
Contents: 1A. Statistics? What’s That?

1A1. What Should You Expect?
1A2. What Do You Get From the Course?
1A3. Sample and Population
1A4. Descriptive and Inferential Statistics
1A5. Statistic and Parameter
1B1. The Gold Standard: Random Samples
· Seeding the Random-Number Generator
· Selecting Members of the Sample
1B2. Almost as Good: Systematic Samples
· Taking a Systematic Sample
1B3. Good but Hard: Cluster Samples
1B4. Stratified Samples
1B5. Census
1B6. Bogus Samples
1C1. What Are Data? What Are Variables?
1C2. Quantitative or Qualitative?
1C3. Summary Statements
1D1. Sampling Error
1D2. Nonsampling Errors
· Self-Selected Samples
· Sampling Bias
· Selection Bias
· Non-Response Errors
· Response Errors
· Data Errors
· Inappropriate Analysis
1E1. Observational Study Versus Designed Experiment
· Confounding and Lurking Variables
· Extended Examples
1E2. Experimental Techniques
· Completely Randomized Design
· Randomized Block Design
· Matched Pairs
· Control Group and Placebo
· Double Blind
1F. Sharp Points
1F1. Rounding and Significant Digits
· How Many Digits?
· How to Round Numbers
· When to Round Numbers
1F2. Powers of 10 from Your Calculator
1F3. Show Your Work!
1F4. Optional: ∑ Means Add ’em Up
1A. Statistics? What’s That?
Summary: We live in an uncertain world. You never have complete information when you make a decision, but you have to make decisions
anyway. Statistics helps you make sense of confusing and incomplete information, to decide whether a pa ern is meaningful or just
coincidence. It’s a way of taming uncertainty, a tool to help you assess the risks and make the best possible decisions.
1A1. What Should You Expect?
Statistics is different from any other math course.

Yes, you’ll solve problems. But most will be real-world practical problems: Does aspirin make heart a acks less likely? Was there racial bias in
the selection of that jury? What’s your best strategy in a casino? (Most examples will be from business, public policy, and medicine, but we’ll hit other
fields too.)
There will be very li le use of formulas. Real statisticians don’t do things by hand. They use calculators or software, and so will you. Your TI-83
or TI-84 may seem intimidating at first, but you’ll quickly get to know it and be amazed at how it relieves you of drudgery.
With li le grunt work to do, you will focus on what your numbers mean. You’re not just a bu on-pushing calculator monkey; you have to think
about what you’re doing and understand it well enough to explain it. Most of the time your answers will be non-technical English, not numbers or
statistical jargon. That may seem scary and unfamiliar at first, but if you stick with it you’ll love stretching your brain instead of just following a
book’s examples by rote.
1A2. What Do You Get From the Course?
It may be a required course, so you get that much closer to graduation. ☺ But you can get more than that.
If you do it right, statistics teaches you to think. You become skeptical, unwilling to take everything at face value. Instead, when somebody makes a
statement you question how they know that and what they’re not telling you. You can’t be fooled so easily. You become a more thoughtful citizen,
a more savvy consumer.
Who knows? You might even have some fun along the way. So— Let’s get started!
1A3. Sample and Population
Suppose you want to know about the health of athletes who use steroids versus those who don’t. Or you want to know whether people are likely to
buy your new type of chips. Or you want to know whether a new type of glue makes boxes less likely to come apart in shipping. How do you answer
questions like that?
With most things you want to know, it’s impossible or impractical to examine every member of the group you want to know about, so you
examine part of that group and then generalize to the whole group.
De initions: A sample is the group you actually take data from. The population is the group you want to know something about.
In Good Samples, Bad Samples, later in this chapter, you’ll see how samples are actually taken.
The sample is usually a subgroup of the population, but in a census the whole population is the sample.
Example 1: You want to know what proportion of likely voters will vote for your candidate, so you poll 850 people. The people you actually ask are
your sample, and the likely voters are the population.
Caution!: Your sample is the 850 people you took data from, not just the subgroup that said they would vote for your candidate. The population
is all likely voters, regardless of which candidate they prefer. Yes, you want to know who will vote for your candidate, but everybody’s vote counts,
so the group you want to know something about — the population — is all likely voters.
De initions: The number of members of your sample is called the sample size or size of the sample (symbol n), and the number of members of the
population is called the population size or size of the population (symbol N).
“Sometimes it is not possible to count the units contained in the population. Such a population is called infinite or uncountable.”
(Finite and Infinite Population 2014 [see “Sources Used” at end of book]) “Smokers” is an example. There is a definite number of smokers
in the world at any moment, but if you try to count them the number changes while you’re counting.
The sample size is always a definite number, since you always know how many individuals you took data from.
Example 2: You’re monitoring quality in a factory that turns out 2400 units an hour, so you test 30 units from each hour’s production.
The units you tested are your sample, and your sample size is 30. All production in that hour is the population, and the population size is 2400.
Isn’t the population the factory’s total production, since you want to know about the overall quality? No! Your sample was all drawn from one
hour’s production. A sample from one production run can tell you about that production run, not about overall operations. This is why quality
testing never ends.
Example 3: You’re testing a new herpes vaccine. 800 people agree to participate in your study. You divide them randomly into two groups and
administer the vaccine to one group and a placebo (something that looks and feels like a vaccine but is medically inactive) to another group. Over the
course of the study, a few people drop out, and at the end you have 397 vaccinated individuals and 396 who received the placebo.
You have two samples, individuals who were vaccinated (n1 = 397) and the control group (n2 = 396). The corresponding populations are all
people who will take this vaccine in the future, and all people who won’t. Both of those populations are uncountable or infinite because more people
are being born all the time.
1A4. Descriptive and Inferential Statistics
Sometimes you want to summarize the data from your sample, and other times you want to use the sample to tell you something about the larger
population. Those two situations are the two grand branches of statistics.
Definition: Descriptive statistics is summarizing and presenting data that were actually measured. Inferential statistics is making statements
about a population based on measurements of a smaller sample.
Example 4: “52.9% of 1000 voters surveyed said they will vote for Candidate A.” That is descriptive statistics because someone actually measured
(took responses from) those 1000 people.
Compare: “I’m 95% confident that 49.8% to 56.0% of voters plan to vote for Candidate A.” That is inferential statistics because no one has asked
all voters. Instead, a sample of voters was asked, and from that an estimate was made of the feelings of all voters.
1A5. Statistic and Parameter
Definitions: A statistic is a numerical summary of a sample. A parameter is a numerical summary of a population.

Mnemonic: sample and statistic begin with s; population and parameter both begin with p.
Continuing with Example 4: “52.9% of 1000 voters surveyed plan to vote for Candidate A.” — 52.9% is a statistic because it summarizes the sample.
“I’m 95% confident that 49.8% to 56.0% of voters plan to vote for Candidate A.” — 49.8% to 56.0% is an estimate of a parameter. (The actual
parameter is the exact proportion presently planning to vote for A, which you don’t know exactly.)
A statistic is always a statement of descriptive statistics and is always known exactly, because a statistic is a number that summarizes a sample of
actual measured data.
A parameter is usually estimated, not known exactly, and therefore is usually a ma er of inferential statistics. The exception is a census, in
which data are taken from the whole population. In that case, the parameter is known exactly because you have complete data for the population, so
the parameter is then descriptive statistics.
Describing … The number is … And the process is …
Any sample A statistic Descriptive statistics
A population (usually) A parameter Inferential statistics
A census (pop. w/ Both statistic Descriptive statistics

every member surveyed) and parameter
Summary: A good sample is a smaller group that is representative of the population. A bad sample does a bad job of representing the population.
You already know that a random sample is a good thing, but did you know that a random sample is actually carefully planned?
What if you can’t take a true random sample? What are good and bad ways to gather samples?
All valid samples share one characteristic: they are chosen through probability means, not selected by any decisions made by the
person taking the sample. Every valid sample is gathered according to some rule that lets the impersonal operations of probability do
the actual selection.
Definition: A probability sample is a sample where the members are chosen by a predetermined process that uses the workings of chance and
removes discretion from the investigators. Some of the types of probability samples are discussed below.
See also: For lots of examples of good sampling and (usually) clear presentation of data about the American people, you might want to visit the
Pew Research Center [URL h p://www.pewresearch.org/ accessed 2014-09-15] and its tech-oriented spinoff, Pew Internet [URL
h p://pewinternet.org/ accessed 2014-09-15]. The venerable Gallup Poll [URL h p://www.gallup.com/home.aspx accessed 2014-09-15]
also makes available its snapshots of the American public.
1B1. The Gold Standard: Random Samples
Definition: A random sample (also called a simple random sample) is a sample constructed through a process that gives every member of the
population an equal chance of being chosen for the sample.
You always want a random sample, if you can get one. But to create a random sample you need a frame, and in many situations it’s impossible or
unreasonably difficult to list all members of the population. The sections below explain alternative types of samples that can lead to statistically valid
results.
“Random” doesn’t mean haphazard. Humans think we’re good at constructing random sequences of le ers and digits, but actually we’re very bad at
it. Try typing 1300 “random” le ers on your keyboard. If you do it really randomly, you should get about 1300÷26 = 50 of each le er. (Note: about 50
of each, not exactly 50. To determine whether a particular sample of text is unreasonably far from random le ers, see Testing Goodness of Fit to a
Model [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c12_gof_ht_root].) But if you’re like most people, the distribution will be very different
from that: some le ers will occur many more than 50 times, and others many less.
So how do you construct a random sample? You need a frame, plus a random-number generator or random-number table.
De inition: A sampling frame, or simply a frame, is a list of all members of the population in a way that lets you assign a unique number to each one.
The frame need not be a physical list; it can be a computer file — these days it usually is. But it has to be a complete list.
If you have a table of random numbers, the table will come with instructions for use. I’ll show you how do it with the TI-83/84, but you could also do
it with Excel’s RANDBETWEEN( ) function, or with any other software that generates pseudo-random numbers. (The Web site random.org [URL
h p://www.random.org accessed 2014-09-15] provides true random numbers based on atmospheric noise.)
Seeding the Random-Number Generator
Random numbers from software or a calculator aren’t really random, but what we call pseudo-random numbers. That means that they are generated
by deterministic calculations designed to mimic randomness pre y well but not perfectly. To help them do a be er job, you need to “seed” the
random number sequence, meaning that you give it a unique starting point so that your sequence of random numbers is different from other
people’s.
You seed the random numbers only once. To do this:
1. Turn on the calculator and press [CLEAR].
2. Come up with a number through some means other than choosing it. For instance, select the first number you see in the newspaper or in a book
that you let fall open where it will. Type this number into the calculator. (Eyes closed, I tapped the financial page with a pen and used the
number that the pen touched.)
3. Press [STO→], which shows on your screen as →.
4. Press [MATH] [◄] [1] to paste rand to the screen. Press [ENTER].
Again, you need to seed random numbers only once on your calculator.
Selecting Members of the Sample
For this you need to know the size of the population, which is the number of individuals in your frame. You will generate a series of random
numbers between 1 and the population size, as follows:
1. Press [MATH] [◄] [5] to paste randInt( to your screen.
2. Press [1] [,], enter the population size, and press [)] [ENTER] to generate the first random number. In my case the
population size was 20,147 and my first random number was 4413, so the first member of my sample will be the
4413th individual, in order, from the sampling frame.
3. Press [ENTER] to generate the next random number. (The randInt function may or may not be displayed again, depending on your calculator
model and se ings.) In my case, the next random number is 4949, so the 4949th individual in my frame becomes the second member of my
sample.
4. Continue pressing [ENTER] until you have your desired sample size. If you get a duplicate random number, simply ignore it and take the next
one. (If your calculator has [8] randIntNoRep, use it instead of plain randInt to prevent duplicates from appearing in the first place.)
1B2. Almost as Good: Systematic Samples
Definition: A systematic sample with k = some number is one where you take every kth individual from a representative subset of the population
to make up your sample.
Example 5: Standing outside the grocery store all day, you survey every 40th person. That is a systematic sample with k=40.
If properly taken, a systematic sample can be treated like a random sample. Then why do I call it almost as good? Because you have to make one big
assumption: that the variable you’re surveying is independent of the order in which individuals appear. In the grocery-store example, you have to assume that
shoppers in the time period when you take your survey are representative of all shoppers. That may or may not be true. For example, a high
proportion of Wegmans shoppers at lunch time are buying prepared foods to eat there or take back to work. At other times, the mix of groceries
purchased is likely to be different.
Taking a Systematic Sample
1. Estimate the number of individuals you will be sampling from, and call this N. (Here your sampling frame is smaller than the population.) In
the grocery-store example, estimate how many shoppers will pass the point where you will stand during the time you’re standing there. If you
estimate 1200 shoppers during the six hours when you’ll take your survey, then N=1200.
If you’re pre y unsure of N, you may need to observe that spot without taking the survey, just to get a preliminary count.
2. Decide how large a sample you want. Divide N by your desired sample size, rounding down, and call the result k. If you want 95 grocery
shoppers in your sample, then k = N/95 = 1200/95 = 12.63 → k=12.
If your estimate of N is uncertain, you’ll want to reduce k a bit. This will increase your sample size, but a sample that’s too large (within
reason) is be er than one that’s too small.
3. If you have never seeded the random-number generator, do it now. See Seeding the Random-Number Generator, above.
4. Take a random number from 1 to k to determine which person will be first in your sample. To do this, press [MATH]
[◄] [5] to paste randInt(, then [1] [,]. Enter the value of k and press [)] [ENTER].
Caution: It’s 1 to k, not 1 to N. If you need to survey every 12th person, then you use randInt(1,12). For
determining where to start in the first 12 people, randInt(1,95) and randInt(1,1200) are both wrong.
At right you see an illustration with k=12. The calculator has determined that I will start with the 2nd person and take every 12th person
after that: 2, 14, 26, 38, 50, and so on.
5. If you reach your desired sample size sooner than expected, keep going for the originally planned time. Why? Because you don’t know
whether the individuals that appear early are different from those that appear late. The good news is that the larger sample will give you more
accurate results, always a good thing.
1B3. Good but Hard: Cluster Samples
Sometimes a true random sample is possible but unreasonably difficult. For example, you could use census records to take a random sample of 1000
adults in the US, but that would mean doing a lot of travel. So instead you take a cluster sample.
De inition: In a cluster sample, you irst subdivide the population into a large number of subunits, called clusters, and then you construct a random
sample from the clusters.
“In single-stage cluster sampling, all members of the selected clusters are interviewed. In multi-stage cluster sampling, further
subdivisions take place.” (Upton and Cook 2008, 76 [see “Sources Used” at end of book])
Example 6: You want to have 600 representative Americans try your new neck pillow to gauge your potential market. Travel to 600 separate locations
across the country would be ridiculously expensive, so you randomly select 30 census tracts and then randomly select 20 individuals within each
selected census tract.
A cluster sample makes one big assumption: that the individuals in each cluster are representative of the whole population. You can get away with a slightly
weaker assumption, that the individuals in all the selected clusters are representative of the whole population. But it’s still an assumption. For this
and other technical reasons, a cluster sample cannot be analyzed in all the same ways as a random sample or systematic sample. Analysis of cluster
samples is outside the scope of this course.
1B4. Stratified Samples
Sometimes you can identify subgroups of your population and you expect individuals within a subgroup to be more alike than individuals of
different subgroups. In such a case, you want to take a stratified sample.
Definition: If you can identify subgroups, called strata (singular: stratum), that have something in common relative to the trait that you’re
studying, you want to ensure that your sample has the same mix of those groups as the population. Such a sample is called a stratified
sample.
Example 7: You’re studying people’s a itudes toward a proposed change in the immigration laws for a Presidential candidate. You believe that some
races are more likely to favor loosening the law and others are more likely to oppose it. If the population is 66% non-Hispanic white, 14% Hispanic,
12% black, 4% Asian, and so on, your sample should have that same composition.
A stratified sample is really a set of mini-samples grouped together.

Example 8: You want to survey a itudes towards sports at a college that is 45% male and 55% female, and you want 400 in your sample. You
would take a sample of 45%×400 = 180 male students and 55%×400 = 220 female students to make up your sample of 400. Each mini-sample would be
taken by a valid technique like a random sample or systematic sample.
1B5. Census
Definition: A census is a sample that contains every member of the population.
In many situations, it’s impossible or highly inconvenient to take a census. But with the near-universal computerization of records, a census is
practical in many situations where it never used to be.
Example 9: At the push of a bu on, a librarian can get the exact average number of times that all library books have been checked out, with no need
for sampling and estimation. An apartment manager can tell the exact average number of complaints per tenant. And so forth.
A census is the only sample that perfectly represents the population, because it is the whole population. If you can take a census, you’ve reduced a
problem of inferential statistics to one of descriptive statistics. But even today, only a minority of situations are subject to a census. For instance,
there’s no way to test a drug on every person with the condition that the drug is meant to treat. It’s totally impractical to interview every potential
voter and determine his or her preferences. And so forth.
1B6. Bogus Samples
Any sample where people select the individual members is a bogus sample. That means every
sample where people select themselves, and every sample where the interviewer decides whether to Because this textbook helps you,
include or exclude individual members. please donate at
Why is that bad? Remember, a proper sample is a smaller group that is representative of the BrownMath.com/donate.
population. No sample will represent the population perfectly, but you do the best you possibly can.
The good samples listed above can go bad if you make various kinds of mistakes (“Statistical
Errors”, later in this chapter), but a sample that doesn’t depend on the workings of chance is always wrong and cannot be made right. The textbooks will
give you names for the types of bad samples — convenience sample, opportunity sample, snowball sample, and so on — but why learn the names
when they’re all bogus anyway?
Good Samples Bad Samples

Chosen through probability methods Chosen by individual decisions about which persons or things to
include
Represent the population as well as possible Do not accurately represent the population
Uncertainty can be estimated, and can be reduced by increasing sample Uncertainty cannot be estimated, and bigger samples don’t help
size
So goodbye to Internet polls and petitions, le er-writing campaigns, “the first 500 volunteers”, and every other form of self-selected sample. If
people select themselves for a sample, then by definition they are not representative because they feel more strongly than the people who didn’t
select themselves. You can make statements about the people who selected themselves, but that tells you nothing about the much larger number who
didn’t select themselves. (More about this in Simon 2001 [see “Sources Used” at end of book], Web Polls.)
Goodbye also to any kind of poll where the pollster selects the individual people. If you set up a rule that depends on the workings of chance
and then follow it, that’s okay. But if you decide on the spur of the moment who gets included, that’s bogus.
Why is it bad to just approach people as you see them? Because studies show that you are more likely to approach people that you perceive to be
like you, even if you’re not aware of that. Ask yourself if you are truly equally likely to select someone of a different race or sex from yourself,
someone who is dressed much richer or poorer than you, someone who seems much more or much less a ractive, and so forth. Unless you’re
Gandhi, the honest answer is “not equally likely”. It doesn’t make you a bad person, just a bad pollster like everyone else. If you tend to pick people
who are more like you, your sample is not representative of the population.
The same principle applies to studies of non-humans. Here the investigator’s intrinsic biases may be less clear, but unless you choose your
sample based on chance you can never be sure that those biases didn’t “skew it up”.
Summary: Statistics is all about data and variables, but what exactly do those terms mean? What are the types of data and variables?
This will be an important topic throughout the course, because different variable types are presented differently in descriptive
statistics, and again are analyzed differently in inferential statistics. So before you do anything, you need to think what type of data
you’re dealing with.
1C1. What Are Data? What Are Variables?
De initions: Variables are the characteristics you’re studying. Data are the values of those characteristics that you record, and the value recorded from any
given member of the sample is called a data point or datum.
You can think of the variable as kind of like a question, and the data points as the answers to that question.
If you record one piece of information from each member of the sample, you have univariate data; if you record two pieces of
information from each member, you have bivariate data.
Example 10: You record the birth weights of babies born in a certain hospital during the year. The variable is “birth weight”.
Example 11: In April, you ask all the members of your sample whether they had the flu vaccine that year and how many days of work or school they
lost because of colds or flu. (Can you see at least two problems with that second question? If not, you will after you read about Nonsampling Errors,
later in this chapter.) This is bivariate data. One variable is “flu shot?” and the data points are all yes or no; the other variable is “days lost to colds and
flu” and the data points are whole numbers.
1C2. Quantitative or Qualitative?
De initions: Quantitative data are data that are numbers. Quantitative data are also called numeric data.
Numeric data are subdivided into discrete and continuous data. Discrete data are whole numbers and typically answer the
question “how many?” Continuous data can take on any value (or any value within a certain range) and typically answer the question
“how much?”
Qualitative data are data that are not numbers. Qualitative data are also called non-numeric data, a ribute data or categorical data.
Common Just seeing numbers in a problem does not mean you have numeric data. Consider this statement: “45% of viewers polled said they
mistakes: thought Candidate X performed well in the debate.” There’s a number there, all right, but you have non-numeric data because each
person answered “yes” or “no”, which means the individual data points are non-numeric.
Some data look like numbers but aren’t: ZIP codes, for instance. When in doubt, ask yourself, “Would it make sense to average the
data?” If the answer is no, you have non-numeric data.
Sometimes we talk about data types, and sometimes about variable types. They’re the same thing. For instance, “weight of a machine part” is a
continuous variable, and 61.1 g, 61.4 g, 60.4 g, 61.0 g, and 60.7 g are continuous data.
Quantitative (numeric) Qualitative (categorical or non-numeric)
You get a number from each member of the sample. You get a yes/no or a category from each member of the sample.
The data have units (inches, pounds, dollars, IQ points, whatever) and The data may or may not have units and do not have a definite sort
can be sorted from low to high. order.
It makes sense to average the data. Your summary is counts or percentages in each category.
Examples (discrete): number of children in a family, number of cigare es Examples: hair color, marital status, gender, country of birth, and
smoked per day, age at last birthday opinion for or against a particular issue
Examples (continuous): height, salary, exact age
Continuous or discrete data? Sometimes when you have numeric data it’s hard to say whether you have discrete or continuous data. But since you’ll
graph them differently, it’s important to be clear on the distinction. Here are two examples of doubtful cases: salary and age.
It’s true that your salary can be only a whole number of pennies. But there are a great many possible values, and the distance between the
possible values is quite small, so you call salary a continuous variable. Besides, you don’t ask “how many pennies do you make?” but rather “how
much do you make?”
What about age? Well, age at last birthday is clearly discrete since it can be only a whole number: “how many years old were you at your last
birthday?” But age now, including years and months and days and fractions of days, would be continuous, again because you can subdivide it as
finely as desired.
1C3. Summary Statements

When you see a summary statement, you have to do a li le mental detective work to figure out the data type. Always ask yourself, what was the
original measurement taken or question asked?
Example 12: “The average salary at our corporation is $22,471.” The original measurement was the salary of each individual, so this is continuous
data.
Example 13: “The average American family has 1.7 children.” Don’t let “1.7” fool you into identifying this as a continuous variable! What was the
original question or measurement? “How many children are there in your family?” That’s discrete data.
Example 14: “Four out of five dentists surveyed recommend Trident sugarless gum for their patients who chew gum.” Yes, there are numbers in the
summary statement, but the original question asked of each dentist was “Do you recommend Trident?” That is a yes/no question, so the data type is
categorical.
Summary: In statistics, an error is not necessarily a mistake. This section explores the types of statistical errors and where they come from.
Definition: An error is a discrepancy between your findings and reality. Some errors arise from mistakes, but some are an inevitable part of the
sampling process.
1D1. Sampling Error
De inition: Even if you make no mistakes, inevitably samples will vary from each other and any given sample is almost sure to vary from the
population. This variability is called sampling error. (It would probably be more helpful to call it sample variability, but we’re stuck
with “sampling error”.)
Sampling error “refers to the difference between an estimate for a population based on data from a sample and the ‘true’ value for
that population which would result if a census were taken.” (Australian Bureau of Statistics 2013) [see “Sources Used” at end of book]
Except for a census, no sample is a perfect representation of the population. So the sample mean (average), for example, will usually be a bit different
from the population mean. Sampling errors are unavoidable, even if you do everything right when you take a random sample. They’re not
mistakes, they’re just part of the nature of things.
Although sampling error cannot be eliminated, the size of the error can be estimated, and it can be reduced. For a given population, a larger sample
size gives a smaller sampling error. You'll learn more about that when you study sampling distributions [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c08_top].
1D2. Nonsampling Errors
Definition: Nonsampling errors are discrepancies between your sample and reality that are caused by mistakes in planning, in collecting data, or
in analyzing data.
Nonsampling errors make your sample unrepresentative of the population and your results questionable if not useless. Unlike sampling errors,
nonsampling errors cannot be reduced by taking larger samples, and you can’t even estimate the size of most nonsampling errors. Instead, the
mistakes must be corrected, and probably a new sample must be taken.
There are many types of nonsampling errors. Different authors give them different names, but it’s much more important for you to recognize the bad
practice than to worry about what to name it. In taking your own samples, and in evaluating what other people are telling you about their samples,
always ask yourself: what could go wrong here? has anything been done that can make this sample unrepresentative of the population? Here are
some of the more common types of nonsampling errors. After you read through them, see how many others you can think of.
Self-Selected Samples
This is almost always bogus. People who select themselves are by definition different from people who don’t, which means they are not
representative. It can be very hard to know whether that difference ma ers in the context of a particular study. Since you can never be sure, it is safest
to avoid the problem and not let people select themselves.
But medical studies all use volunteers. (They have to, ethically.) Why doesn’t that make the sample bogus? They’re volunteers, but usually they’re not
self-selected volunteers. For example, researchers may ask doctors and hospitals to suggest patients who meet a particular profile; they use
probability techniques to select a sample from that pool.
But things are not always simple. For example, some companies or researchers may advertise and pay volunteers to undergo testing. In this case
you have to ask very serious questions about whether the volunteers are representative of the general population. Statistical thinking isn’t a ma er of
black and white, but some pre y sophisticated judgment can be involved. Your take-away is: don’t accept anything at face value, but always ask:
What important facts are being left out? What does that do to the credibility of the results?
Sampling Bias
Definition: Sampling bias results from taking a sample in a way that tends to over- or under-represent some subgroup of the population.
Example 15: If you’re doing a survey on student a itudes toward the cafeteria, and you conduct the survey in the cafeteria, you are systematically
under-representing students who don’t use the cafeteria. It seems logical that a itudes are more negative among students who don’t use the cafeteria
than among students generally, so by excluding them you will report overall a itude as more favorable than it really is.
“Bias” is a good example of the words in statistics that don’t have their ordinary English meaning. You’re not prejudiced against students who dislike
the cafeteria. “Bias” in statistics just means that something tends to distort your results in a particular direction.
Example 16: The classic example of sampling bias is the Literary Digest fiasco in predicting that Landon would beat Roosevelt in the 1936 election. The
magazine sent questionnaires to all its subscribers, it phoned randomly selected people in telephone books, and it left stacks of questionnaires at car
dealerships with instructions to give one to every person who test drove a car. The sample size was in the millions.
This procedure systematically over-represented people who were well off and systematically under-represented poorer people. In 1936 the Great
Depression still held sway, and most people did not have the disposable income to subscribe to a fancy magazine, let alone a home telephone; the
very thought of buying a car would have struck them as ridiculous or insulting. In that era, the Republicans appealed more to the rich and the
Democrats more to the working class. So the net effect of the Literary Digest’s procedure was that it made the country look a lot more Republican than
it actually was. Since Landon was a Republican and FDR a Democrat, FDR’s actual support was much greater than shown by the poll, and Landon’s
was much less.
Notice that a sample size of millions did not overcome sampling bias. A larger sample size is not an answer to nonsampling errors.
The Digest’s original article can be found in Landon in a Landslide: The Poll That Changed Polling (American Social History Project) [see “Sources
Used” at end of book].
While we’re on the subject of presidential elections, different nonsampling errors also led to wrong predictions of a Dewey victory over Truman in
1948. For analyses of both the 1936 and the 1948 statistical mistakes, see Classic Polling Surprises (2004) [see “Sources Used” at end of book] and
Introduction to Polling (n.d.) [see “Sources Used” at end of book].
Selection Bias
Beyond sampling bias, there are many other bad practices in selecting your sample can bias the results. Wikipedia’s Selection Bias [see “Sources Used”
at end of book] has a good rundown of quite a few.
Non-Response Errors
If you’re taking a mail survey, a significant number of people (probably a majority) won’t respond. Are the responders representative of the non-
responders, or has a bias been introduced by the non-response? That’s a tough question, and the answer may not always be clear.
For this reason, mail surveys are often coded so that the investigators can tell who did respond, and follow up with those who didn’t. That
follow-up can be more mail, a phone call, or a visit.
Even with in-person polls, non-response is a problem: many people will simply refuse to participate in your survey. Depending on what you’re
surveying, that could be unimportant or it could be a fatal flaw.
Response Errors
Definition: Response errors occur when respondents give answers that don’t match objective truth or their own actual opinions.
Poorly worded survey questions are a major source of response errors, and lead to biased results or completely meaningless results. There may not
be a perfect survey question, but having several people review the questions against a list of possible problems will greatly reduce the level of
response errors.
But response errors can never be completely eliminated. For instance, people tend to shade their answers to make themselves look good in their own
eyes or in the interviewer’s eyes. Most people rate themselves as be er-than-average drivers, for example, which obviously can’t be true. And self-
reporting of objective facts is always suspect because memory is unreliable.
Example 17:
“How often do you read to your child?” (People will tend to award themselves points for good intentions, and they don’t want to look like bad
parents.)
“Do you think immigrants should be allowed to take jobs from honest Americans?” (That’s a leading question. Compare with “Should immigrants
be allowed to apply for jobs and pay taxes in the US?” You can see that a given person might give different answers to those two questions.)
“How much do you spend on food, including groceries and restaurants, in a typical week?” (Most people don’t carry an accurate accounting
system in their heads.)
“Do you agree with the X Party platform on gun control, education, abortion, and taxes?” (Asking too much in one question. If you agree with
some but not all, how do you answer?)
“Do you favor prison reform?” (Too vague — nobody’s against prison reform in principle, but the specifics of a particular policy would make all
the difference.)
“The President has proposed a Federal 30-day waiting period for the purchase of any automatic or semiautomatic weapon. Do you favor this
proposal?” (Superficially this looks good: it’s specific, and it’s asking only one thing. But in fact it’s biased by the use of “favor” alone. You should
word questions neutrally: “Do you favor or oppose. …” Be er yet, “What is your opinion of this proposal?” with options of strongly agree, agree,
neutral, disagree, and strongly disagree.
“In the race for mayor, do you favor candidate A, B, or C?” (Some people are more likely to choose the first alternative because it’s first, and they
don’t like to say they have no strong opinion. It is be er to vary the order of candidates randomly to avoid a response error in favor of A. Of
course, you should also offer “other” and ’not sure” as alternatives.)
Data Errors
These include mistakes by interviewers in recording respondents’ answers, mistakes by investigators in measuring and recording data, and mistakes
in entering the recorded data.
Inappropriate Analysis
In the second half of the course you’ll learn a number of inferential statistics procedures. Each one is appropriate in some circumstances and
inappropriate in others. If you use the wrong form of analysis in a given situation, or you apply it wrongly, your results will be about as good as the
results from using a hammer to drive a screw.
Summary: There are two main methods of gathering data, the observational study and the experiment. Learn the differences, and what each one
can tell you.
1E1. Observational Study Versus Designed Experiment
Many, many statistical investigations try to find out whether A causes B. To do this, you have two groups, one with A and one without A, or you
have multiple groups with different levels of A. You then ask whether the difference in B among the groups is significant. The two main ways to
investigate a possible connection are the observational study and the experiment.
The concepts aren’t hard, but there’s a boatload of vocabulary. Let’s get through the definitions first, and then have some concrete examples to
show how the terms are used. Please read the definitions first, then read the first example and refer back to the definitions; repeat for the other
examples.
Definition: In an observational study, the investigator simply records what happens (a prospective study) or what has already happened (a
retrospective study). In an experiment, the investigator takes an more active role, randomly assigning members of the sample to
different groups that get treated differently.
Which is be er? Well, in an observational study, you always have the problem of wondering whether the groups are different in some way other than
what you are studying. This means that an observational study can never establish cause. The best you can do after an observational study is to say
that you found an association between two variables.
BTW: How do we establish cause, when for ethical or practical reasons we can’t do an experiment? The nine criteria are listed in Causation (Simon 2000b [see “Sources
Used” at end of book]) and were first laid down by Sir Austin Bradford Hill [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_BradfordHill] in 1965.
De initions: In an observational study or an experiment, there are two or more variables. You want to show that changes in one or more of them,
called the explanatory variables, go with changes in one or more response variables.
Explanatory variables are the suspected causes, and response variables are the suspected effects or results.
Example 18: Over the course of a year, you have parents record the number of minutes they spend every day reading to their child, and at the end of
the year you record each child’s performance on standard tests. The explanatory variable is parental time spent reading to the child, and the response
variable(s) are performance on the standardized test(s).
Definitions: In an experiment, the experimenter manipulates the suspected cause(s), called explanatory variable(s) or factor(s). A specific level of
each factor is administered to each group. The level(s) of the explanatory variable(s) in a given group are known as its treatment.
Example 19: To test productivity of factory workers, you randomly assign them to three groups. One group gets an extra hour at lunch, one group
gets half-hour breaks in morning and afternoon, and one group gets six 10-minute breaks spaced throughout the day. The explanatory variable or
factor is structuring of break time, and the three levels or treatments are as described.
Definitions: In an experiment, each member of a sample is called a unit or an experimental unit. However, when the experiment is performed on
people they are called subjects or participants.
De inition: In any study or experiment, results will vary for individuals within each group, and results will also vary between the groups as a
whole. Some of that variation is due to chance: it is expected statistical variability or sampling error. If the differences between groups
are bigger than the variation within groups — and enough bigger, according to some calculations you’ll learn later — then the
investigator has a significant result. A significant result is a difference that is too big to be merely the result of normal statistical
variability.
I’ll have a lot more to say about significance when you study Hypothesis Tests [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c10_top].
Confounding and Lurking Variables
In Example 18, about reading to children, you find generally that the more time parents spend reading to first graders, the be er the children tend to
do on standard tests of reading level.
Is the reading time responsible for the improved test scores? You can easily think of other possible explanations. Parents who spend more time
reading to their children probably spend more time with them in general. They tend to be be er off financially — if you’re working two jobs to make
ends meet, you probably have li le time available for reading to your children. Economic status and time spent with children in general are examples
of lurking variables in this study.
Definition: A hidden variable that isn’t measured and isn’t part of your design but affects the outcome is called a lurking variable.
Example 20: In a large elementary school, you schedule half the second grade to do art for an hour, two mornings a week, with the district’s art
teacher. The other half does art for an hour, two afternoons a week, with the same teacher, but they are told at the beginning that all their projects will
be displayed and prizes given for the best ones.
Can you learn anything from this about whether the chance to win prizes prompts children to do a be er job on art projects? The problem is that
there’s not just one difference in treatment here, the promised prizes. There’s also the fact that everyone’s project will be on display. And maybe
mornings are be er (or worse) for doing art than afternoons. Maybe the teacher is a morning person and fades in the afternoon, or is not a morning
person and really shines in the afternoon. Even if there’s a difference in quality of the projects, you can’t make any kind of simple statement about the
cause, because of these confounding variables.
De inition: A confounding variable is “associated in a non-causal way with a factor and affects the response.” (DeVeaux, Velleman, Bock 2009,
346 [see “Sources Used” at end of book])
“Confounding occurs in an experiment when you [can’t] distinguish the effects of different factors.” (Triola 2011, 32 [see “Sources
Used” at end of book])
In the art example, you wanted to find out whether promising prizes makes children do be er art work. But the promise of prizes wasn’t the only
difference between the two groups. Time of day and public display are confounding variables built into the design of this experiment. You know
what they are, but you can’t untangle their effect from the effect of what you actually wanted to study.
What’s the difference between lurking variables and confounding variables? Both confuse the issue of whether A causes B.
A lurking variable L is associated with or causes both A and B, so any relationship you see between A and B is just a side effect of the L/A and
L/B relationships. For example, counties with more library books tend to have more murders per year. Does reading make people homicidal? Of
course not! The lurking variable is population size. High-density urban counties have more books in the library and more murders than low-density
rural counties.
A confounding variable C is typically associated with A but doesn’t cause it, so when you look at B you don’t know whether any effect comes
from A, from C, or from both. For example, after a year with a lot of motorcycle deaths, a state passes a strict helmet law, and the next year there are
significantly fewer deaths. Was the helmet law responsible? Maybe, but time is a confounding variable here. Were motorcyclists shocked at the high
death toll, so that they started driving more carefully or switched to other modes of transit?
Don’t obsess over the difference between lurking and confounding variables. Some authors don’t even make a distinction. You should recognize
variables that make results questionable; that’s a lot more important than what you call them.
BTW: That said, if you want to see two more takes on the difference, have a look at Confounding and Lurking Variables (Virmani 2012 [see “Sources Used” at end of book])
and Confounding Variables (Velleman 2005 [see “Sources Used” at end of book]).
Lurking and confounding variables are the boogeyman of any statistical work. Lurking variables are the reason that an observational study can show
only association, not causation. In experiments, you have the potential to exclude lurking variables, or at least to minimize them, but it takes planning
and extra work, and you need to be careful not to create a design with built-in confounding..
Whenever any experiment claims that A causes B, ask yourself what lurking variables there might be, and whether the design of the study
has ruled them out. You can’t take this for granted, because even professional researchers sometimes cut corners, knowingly or unknowingly.
Extended Examples
Example 21: Does smoking cause lung cancer?

Initial studies in the mid-20th century had three or four groups: non-smokers, light smokers, moderate smokers, and heavy smokers. They looked at
the number and severity of lung tumors in the groups to see whether there was a significant difference, and in fact they found one.
This was an observational study. Ethically it had to be: if you suspect smoking is harmful you can’t assign people to smoke.
Explanatory variable: smoking level (none, light, moderate, heavy). Levels or treatments don’t apply to an observational study.
Response variable: tumor production
Because this was an observational study, there was no control for lurking variables, and even with a significant result you can’t say from this study
that smoking causes lung cancer. What lurking variables could there be? Well, maybe some genetic factor both makes some people more likely to
smoke and makes them more susceptible to lung cancer. This is a problem with every observational study that finds an effect: you can’t rule out
lurking variables, and therefore you can’t infer causation, no ma er how strong an association you find.
Since you can’t do an experiment on humans that involves possibly harming them, how do you know that smoking causes lung cancer? A good
explanation is in Causation (Simon 2000b [see “Sources Used” at end of book]).
Example 22: Does aspirin make heart a acks less likely?

Here you can do an experiment, because aspirin is generally recognized as safe. Investigators randomly assigned people to two groups, gave aspirin
to one group but not the other, and then monitored the proportion who had heart a acks. They found a significantly lower risk of heart a ack in the
aspirin group.
This was a designed experiment.

Explanatory variable: aspirin. There were two levels or treatments: yes and no.
Response variable: heart a ack (yes/no)
From this experiment, you can say that aspirin reduces the risk of heart a ack. How can you be sure there were no lurking variables? By randomly
assigning people to the two groups, investigators made each group representative of the whole population. For example, overweight is a risk factor
for heart a acks. The random assignment ensures that overweight people form about the same proportion in each group as in the population. And
the same is true for any other potential lurking variable. (It helps to have larger samples, and in this study each sample was about 10,000 people.)
Example 23: Does prayer help surgical patients?

Here again, no one thinks prayer is harmful, so ethically the experimenters were in the clear to assign cardiac-bypass patients randomly to three
groups: people who knew they were prayed for, people who were prayed for and didn’t know it, and people who were not prayed for. Investigators
found no significant difference in frequency of complications between the patients who were prayed for and those who were not prayed for.
This was a designed experiment.

Explanatory variables: receipt of prayer (two levels, yes and no) and knowledge of being prayed for (also two levels, yes and no). There were three
treatments: (a) receipt=yes and knowledge=yes, (b) receipt=yes and knowledge=no, (c) receipt=no.
Response variable: occurrence of post-surgical complications (yes/no).
(You can read an abstract of the experiment and its results in Study of the Therapeutic Effects of Intercessory Prayer [Benson 2006 [see “Sources Used” at
end of book]]. The full report of the experiment is in Benson 2005 [see “Sources Used” at end of book].)
Because lurking variables can’t be ruled out in an observational study, investigators always prefer an experiment if possible. If ethical or other
considerations prevent doing an experiment, an observational study is the only choice. But then the best you can hope for is to show an association
between the two variables. Only with an experiment do you have a hope of showing causation.
1E2. Experimental Techniques
Okay, so you always have to do an experiment if you want to show that A causes B. Let’s look in more detail at how experiments are conducted, and
learn best practices for an experiment.
Caution: Design of Experiments is a specialized field in statistics, and you could take a whole course on just that. This chapter can only give you
enough to make you dangerous.☺ While you’re planning your first experiment in real life, it’s a good idea to get help from someone senior or a
professional statistician.
BTW: R. A. Fisher [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Fisher] “virtually invented the subject of experimental design” (Upton and Cook 2008, 144 [see
“Sources Used” at end of book]), and pioneered many of the techniques that we use today. He was a great champion of planning: Upton & Cook quote him as saying “To call
in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of.”
Completely Randomized Design
Definitions: Experimenters randomly assign members to the various treatment groups. We say that they have randomized the groups, and this
process is called randomization.
Why randomize? Why not just put the first half of the sample in group A and the second half in group B? Because randomization is how you control
for lurking variables.
Think about the study with aspirin and heart a acks. You know that different individuals are more or less susceptible to heart a acks. Risk
factors include smoking, obesity, lack of exercise, and family history. You want your aspirin group and your non-aspirin group to have the same mix
of smokers and non-smokers as the general population, the same mix of obese and non-obese individuals, and so on. Actually it’s harder than that.
There aren’t just “smokers” and “non-smokers”; people smoke various amounts. There aren’t just “obese” and “fit” people, but people have all levels
of fitness.
It would be very laborious to do stratified samples and get the right proportions for a lot of variables. You’d have to have a huge number of
strata. And even if you did do those matchups, taking enormous trouble and expense, what about the variables you didn’t think of? You can never be
sure that the samples have the same composition as the population.
It really must be random assignments — you can’t just assign test subjects to groups alternately. Steve Simon (2000a) explains why, with examples, in
Alternating Treatments [see “Sources Used” at end of book].
Randomization is the indispensable way out. Instead of trying to match everything up yourself — and inevitably failing — you let impersonal
random chance do your work for you.
Are you guaranteed that the sample will perfectly represent the population? No, you’re not. Remember sampling error, earlier in this chapter.
Samples vary from the population; that’s just the nature of things. But when you randomize, in the long run most of your samples will be
representative enough, even though they’re not perfectly representative.
Randomized Block Design
Notice I said that randomization works in the long run. But in the short run it may not. Suppose you are testing a weight-loss drug on a group of 100
volunteers, 50 men and 50 women. If you completely randomize them, you might end up with 20 men and 30 women in one group, and 30 men and
20 women in the other. (There’s about a 20% chance of this, or a more extreme split.)
Why is this bad? Because you don’t know whether men and women respond differently to the drug. If you see a difference between your 20/30
placebo group and your 30/20 treatment group, you don’t know how much of that is the drug and how much is the difference between men and
women. Gender is a confounding variable.
What to do? Create blocks, in this case a block of the 50 men and a block of the 50 women. Then within each block you randomly assign individuals
to receive medication or a placebo. Now you can find how the drug affects women and how it affects men. This is called a randomized block design.
When you can identify a potentially confounding variable before you perform your experiment, first divide your subjects into blocks according to
that variable, and then randomize within each block. Do this, and you have tamed that confounding variable.
BTW: R. A. Fisher [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Fisher] coined the term “randomized block” in 1926.
In this example, gender would be called a blocking variable because you divide your subjects into blocks according to gender. Now there’s no
problem separating the effects of the drug from the effects of gender in your experimental results.
When I talked about complete randomization, I said it would be laborious to take strata of a lot of variables, and that complete randomization was
the answer. But here I’m suggesting exactly that for men and women in the weight-loss study. Right about now, you might be telling me, “Make up
your mind!”
This is where some judgment is needed in making tradeoffs. Men and women typically have different percentages of body fat, and they are
known to respond differently to some drugs. It makes sense that a weight-loss drug could have different response from men and women, and
therefore you block on gender. But no other factor stands out as both important and measurable. If you tried to block on motivation, for instance,
how would you measure it?
“Block what you can, randomize what you cannot” is a good rule, sometimes a ributed to George Box. A variable is a candidate for blocking
when it seems like it could make a difference, and you can identify and measure it. For other variables, we depend on randomization, either complete
randomization or randomization within blocks.
Matched Pairs
There’s one circumstance where you can be sure that the subgroups are perfectly matched with respect to lurking variables: when you use a matched-
pairs design. This is kind of like a randomized block design where each block contains two identical subjects.
Example 24: You want to know whether one form of foreign-language instruction is more effective than another. So you take fifty pairs of identical
twins, and assign one twin from each pair to group A and the other twin to group B. Then you know that genetic factors are perfectly balanced
between the two groups. And if you restrict yourself to twins raised together, you’ve also controlled for environmental factors.
A special type of matched-pairs design matches each experimental unit to itself.
Example 25: You want to know the effect of caffeine on heart rate. You don’t assemble a sample, give coffee to half of them, and measure the
difference in heart rate between the groups. People’s heart rates vary quite a bit, so you would have large variation within each group, and that might
swamp the effect you’re looking for.
Instead, you measure each individual’s resting heart rate, then give him or her a cup of coffee to drink, and after a specified time measure the
heart rate again. By comparing each individual to himself or herself, you determine what effect caffeine has on each person’s heart rate, and people’s
different resting heart rates aren’t an issue.
See also: Experimental Design in Statistics shows how the same experiment would work out with a completely randomized design, randomized
blocks, and matched pairs.
Control Group and Placebo
Definition: In an experiment, usually one of the treatments will be no treatment at all. The group that gets no treatment is called the control
group.
But “no treatment at all” doesn’t mean just leaving the control group alone. They should be treated the same way as the other groups, except that
they get zero of whatever the other groups are ge ing. If the treatment groups are ge ing injections, the control group must get injections too.
Otherwise you’ve introduced a lurking variable: effects of just ge ing a needle stick, and in humans the knowledge that they’re not actually ge ing
medicine.
Definition: A placebo is a substance that has no medical activity but that the subjects of the experiment can’t tell from the real thing.
The placebo effect is well known. Sick people tend to get be er if they feel like someone is looking after them. So if you gave your treatment group
an injection but your control group no injection, you’d be pu ing them in a different psychological state. Instead, you inject your control group with
salt water.
BTW: TheProfessorFunk has a fun three-minute YouTube video in the placebo effect (Keogh 2011 [see “Sources Used” at end of book]). Thanks to Benjamin Kirk for drawing
this to my a ention.
You might think placebos would be unnecessary when experimenting on animals. But if you’ve ever had a pet, you know that some animals are
stressed by ge ing an injection. If the control group didn’t get an injection, you’d have those differing stress levels as a lurking variable. So you
administer a placebo.
Example 26: Sometimes, for practical or ethical reasons, you have to get a li le bit creative with a control group. Here’s an excellent example from
Wheelan (2013, 238) [see “Sources Used” at end of book]:
Suppose a school district requires summer school for struggling students. The district would like to know whether the summer
program has any long-term academic value. As usual, a simple comparison between students who a end summer school and those
who do not would be worse than useless. The students who a end summer school are there because they are struggling. Even if the
summer school program is highly effective, the participating students will probably still do worse in the long run than the students
who were not required to take summer school. What we want to know is how the struggling students perform after taking summer
school compared with how they would have done if they had not taken summer school. Yes, we could do some kind of controlled experiment
in which struggling students are randomly selected to a end summer school or not, but that would involve denying the control
group access to a program that we think would be helpful.
Instead, the treatment and control groups are created by comparing those students who just barely fell below the threshold for
summer school with those who just barely escaped it. Think about it: the [group of all] students who fail a midterm are appreciably
different from [the group of all] students who do not fail a midterm. But students who get a 59 percent (a failing grade) are not
appreciably different from those students who get a 60 percent (a passing grade).
Double Blind
Some people do be er if they think they’re ge ing medicine, even if they’re not. To avoid this placebo effect, the standard technique is the double
blind.
Definitions: In a double-blind experiment, neither the test subjects nor those who administer the treatments know what treatment each subject is
ge ing. In a single-blind experiment, the test subjects don’t know which treatment they’re ge ing, but the personnel who administer
the treatments do know.
Okay, given that people’s thoughts influence whether they improve, a single blind makes sense. If you let someone know they’re not ge ing
medicine in a trial, they’re less likely to improve. But why isn’t that enough? Why is a double blind necessary?
For one thing, there’s always the risk that a doctor or nurse might tell the subject, accidentally or on purpose. But beyond that, if you’re treating
someone who has a terrible disease, you might treat them differently if they’re ge ing a placebo that if they’re ge ing real medicine, even if you don’t
realize you’re doing it. Why take the risk of introducing another lurking variable? Be er to use a double blind and just rule out the possibility.
You might wonder how it’s done in practice. In a drug trial, for instance, each test subject is assigned a code number, and the drug company then
packages pills or vaccines with a subject’s code number on each. The doctors and nurses who administer the treatments just match the code number
on the pill or vaccine to each subject’s code number. Of course all the pills or vaccines look alike, so the workers who have contact with the subjects
don’t know who’s ge ing medicine and who’s ge ing a placebo. And what they don’t know, they can’t reveal.
1F. Sharp Points
1F1. Rounding and Significant Digits
You’ll be dealing with numbers through most of this course. Handle them right, and you won’t get burned! There are three issues here: how many
digits to round to, how to round to that number of digits, and when to do your rounding.
How Many Digits?
There are a lot of rules for how many digits you should round to, but we’re not going to be that rigorous in this course. Instead, you’ll use common
sense supplemented by a few rules of thumb. What’s common sense? Avoid false precision, and avoid overly rough numbers.
BTW: The rules are important, but we have only so much time, and you’ve probably learned them in your science courses. If you want to, look up “significant figures” or
“significant digits” in the index of pre y much any science textbook, or look at Significant Figures/Digits [see “Sources Used” at end of book].
Example 27: When you fill your car’s gas tank, the pump shows the number of gallons to three decimal places. You can also describe that as the
nearest thousandth of a gallon. How much gas is that? Convert it to teaspoons (Brown 2009 [see “Sources Used” at end of book]): (0.001 gal) ×
(4 qt/gal) × (4 cup/qt) × (16 Tbsp/cup) × (3 tsp/Tbsp) ≈ 0.8 tsp. You can bet there’s several times that much in the hose when the pump shuts off. Three
decimal places at the gas pump is false precision a/k/a spurious accuracy. That third decimal place is just noise, statistical fluctuations without real
significance.
On the other hand, suppose the pump showed only whole gallons. This is too rough. You can go along pumping gas for no extra charge (bad for
the merchant), and then abruptly the cost jumps by several dollars (bad for you).
Here are some rules of thumb to supplement your common sense. These are not ma ers of right and wrong, but conventions to save thinking time:
Round averages and other statistics to one more decimal place than the original data. If you’ve surveyed families for the number of children
in each household, you have a bunch of whole numbers, which have zero decimal places. Your average should have one decimal place.
Round probabilities to four decimal places unless you show them as exact fractions.
Round z-scores (Chapter 3 and later) and other test statistics to two decimal places.
How to Round Numbers
Round in one step. Say you have a number 1.24789, and you want to round it to one decimal place. Draw a line — mentally or with your pencil — at
the spot where you want to round: 1.2|4789. If the first digit to the right of that line is a 0, 1, 2, 3, or 4, throw away everything to the right of the line. It
doesn’t ma er what digits come after that first digit. Here, the first digit to the right of the line is a 4, so you throw away everything to the right of the
line: 1.24789 rounded to one decimal place is 1.2.
Rounding in multiple steps, 1.24789 → 1.2479 → 1.248 → 1.25 → 1.3, is wrong. (Why? Because 1.24789 is 0.05211 units away from 1.3, but only
0.04789 units away from 1.2.) You must round in one step only.
As you know, if the first digit to the right of the line is a 5, 6, 7, 8, or 9, you raise the digit to the left of the line by one and throw away everything to
the right of the line. To one decimal place, 1.27489 is 1.2|7489 → 1.3.
You may need to “carry one”. What is 1.97842 to one decimal place? 1.9|7842 needs you to increase that 9 by one. That means it becomes a zero
and you have to increase the next digit over: 1.9+0.1 = 2.0. Therefore, 1.97842 rounded to one decimal place is 2.0.
When to Round Numbers
Here’s the Big No-No: Never do further calculations with rounded numbers. What’s the right way? Round only after the last step in calculation.
Example 28: True story: In Europe, average body temperature for healthy people was determined to be 36.8°C, as repeated in A Critical Appraisal of
98.6°F (Mackowiak, Wasserman, Levine 1992 [see “Sources Used” at end of book]). Rounding to the nearest degree, the average human body
temperature is 37°C. So far so good.
But in the US, thermometers for home use are marked in degrees Fahrenheit. Some nimrod converted 37°C using the good old formula 1.8C+32
and got 98.6°F, and that’s what’s marked on millions of US thermometers as “normal” temperature. If you’ve got one of those, ask for your money
back, because it’s wrong.
Why is it wrong? The person who did the conversion commi ed the Big No-No and did further calculations with a rounded number. For a
correct calculation, use the unrounded number, 36.8. (Okay, 36.8 was probably rounded from 36.77 or 36.82 or something. But the point is that it’s
the least rounded number available.) 1.8×36.8+32 = 98.24 → 98.2, and that is the average body temperature for healthy humans.
1F2. Powers of 10 from Your Calculator
When a calculation results in a number lower than about 0.0005, your calculator will usually present it in the dreaded
scientific notation, like this example. Be alert for this! Your answer is not 1.99 (or however you want to round it). Your
answer is 1.99×10-4.
How do you convert this to a decimal for reading by ordinary humans? (And yes, you should usually do that — definitely, if your work will be read
by non-technical people.)
The exponent (the number after the E minus) tells you how many zeroes the decimal starts with, including the zero before the decimal point.
1.99×10-4 is 0.000 199 or 0.0002.
When a decimal starts with a bunch of zeroes, especially if the decimal is long, many people use spaces to separate groups of three digits. This
makes the decimal easier to read.
1F3. Show Your Work!
Don’t just write down answers; show your work. This is in your own best interest:
It helps you organize your thoughts, so that you’re less likely to make a mistake.
If your work is substantially correct, you may get partial credit even if your final answer was wrong.
Your instructor probably won’t give full credit for a bare answer with nothing to back it up. (My own practice is to write WTCF for “where’d
this come from?”)
“But,” I hear you object, “in the real world, all that ma ers is ge ing the right answer.” True enough, but there’s a difference between
being in the real world and preparing for the real world. Part of your study is to develop thought and work habits that ensure you will get the
right answer when there’s nobody around to check you. You expose your process now, so that problems can be corrected.
How do you show your work?
The general idea is to show enough that someone familiar with the course content can follow what you did.
When evaluating a formula, write down the formula, then on a line below show it with the numbers replacing the le ers. Your calculator can handle
very complicated formulas in one step, so your next line will be your last line, containing the final answer and any rounding you do. Example:
SEM = σ/√n
SEM = 160/√37
SEM = 26.30383797 → SEM = 26.3
You’ll be using a lot of the menus and commands on the TI-83 or TI-84. Here are some tips:
Show all command arguments. If you’re using randInt to get five random integers from 1 to 100, write down randInt(1,100,5). That’s
the only way your instructor will know that you know how to use that function. If you think the command is randInt(5,100), now is the
time to correct that misunderstanding.
Abbreviate repetitive information. When you put a column of numbers into list 1, you don’t have to write down all the numbers and say
“L1”. Instead, just write “x’s in L1” (use the actual column description if it’s not “x’s”).
Focus on commands, not keystrokes. If you’re doing 1-VarStats L1,L2, write that. For pity’s sake, don’t write all the keystrokes, [STAT]
[►] [1] [2nd] [L1] [,] [2nd] [L2] [ENTER]. I put them in this book because you’re just learning them. But someone familiar with the course
knows how to get the command, and I hate to think of all the time and paper you could waste.
Show inputs first, then outputs. Many students show their answer, then as an afterthought they write down the command. Write down the
command before you enter it in the calculator. When writing down the outputs, you can omit any that are the same as the inputs.
1F4. Optional: ∑ Means Add ’em Up
You’ll find that your calculator does the complicated stuff for you, but here and there I’ve sca ered formulas in BTW paragraphs in case you want to
peek behind the curtain.
Stats formulas usually need to do the same thing to every member of a data set and then add up the results. The Greek le er ∑, a capital sigma,
indicates this. This summation notation makes formulas easier to write — easier to read, too, once you get used to it.
Some examples:
∑x = sum of all data points. (x means a data point. If you had to write this out the long way, it would be x1 + x2 + x3 + … + xn, where n is the
size of your data set.)
∑x² = square each data point and add up the squares. (∑ is an addition operator, so powers and multiplication happen before the
summation.)
∑xf = multiply each unique data point by the number of times it occurs, and add up the results. (f means frequency or repetition count.)
∑x²f = square each unique data point and multiply by the number of times it occurs, then add up the results.
∑(x − x̅)² = take each data point and subtract the average of the whole sample, square the result, and add up all the squares. (x̅ is the average
of a sample. The parentheses tell you that you don’t square the average, you square the differences.)
Key ideas: (The online book has live links to all of these.)
Descriptive versus inferential statistics.
Sample versus population, and statistic versus parameter.
Variable type (same as data type) — numeric versus non-numeric, and the two types of numeric data.
Simple random sample. (“Random” doesn’t mean what you think.)
Systematic sample, k and randInt.
Mistakes in designing or taking samples. (Your sample is suspect if some factor other than chance determines the members of the
sample.)
Sampling error. (A more descriptive term is sample variability. It can’t be eliminated, but it can be managed.)
Sampling bias among other nonsampling errors. (“Bias” doesn’t mean what you think it does.)
Observation versus experiment; only an experiment lets you infer cause and effect.
Lurking variables — always be on the lookout for the possibility.
Randomization and matched pairs.
Mechanics: How and when to round numbers, reading scientific notation from your calculator, and showing your work.
Study aids: TI-83/84 Cheat Sheet

Statistics Symbol Sheet Because this textbook helps you,
How to Read a Math Book please donate at
How to Work a Math Problem BrownMath.com/donate.
How to Take a Math Test or Quiz
Write out your solutions to these exercises, showing your work for all computations. Then check your solutions against the solutions page and get
help with anything you don’t understand.
Caution! If you don’t see how to start a problem, don’t peek at the solution — you won’t learn anything that way. Ask your instructor or a tutor
for a hint. Or just leave it and go on to a different problem for now. You may find when you return to that “impossible” problem that you see how to
do it after all.
Briefly distinguish sampling error from nonsampling error. Which one represents avoidable mistakes? The other type can’t be eliminated, but
1 what can you do to reduce it?
A gynecologist wants to study pregnant women’s use of prenatal vitamins. One month, she randomly selects one of her first five patients. For the
2 rest of that month, she records data on every fifth pregnant patient that she sees.
(a) What type of sample is this?
(b) Is it a good sample or a bad sample? Why?
(c) Is the gynecologist performing an observational study or an experiment?
To test Gro-Mor plant food, investigators randomly divide 150 bulbs into three groups. They are planted in a greenhouse under identical
3 conditions, except that one group gets no plant food, one group gets Gro-Mor, and one group gets Magi-Grow, a competitor’s product. The
height of each plant is measured at the end of each week for 13 weeks. Identify the following:
(a) Type of experimental design.
(b) Factor(s).
(c) Treatments or levels.
(d) Response variable(s).
(e) Experimental units.
(f) Explanatory variable(s).
(g) Which is the control group?
The National Census of Borgovia released the statement, “The average number of children in Borgovian families is 2.1.” (a) Identify the variable.
4 (b) State the specific variable type. (c) Is the number 2.1 a statistic or a parameter?
You’re taste-testing your new formula for Whoopsie Cola against your old formula. You assemble a focus group of 80 people and give them each
5 a small cup of each drink. (Half the group gets old cola, water, new cola; the other half gets new, water, old. Of course you don’t tell them what
they’re ge ing.) 55 of the people in the focus group like the new formula be er.
(a) Describe the sample.
(b) What is the sample size?
(c) Describe the population.
(d) What is the population size?
No sample can perfectly represent the population, so no two samples will be the same, even if your sampling technique is perfect.
6 (a) What is the name for this variation?
(b) What can be done to reduce this variation?
“Have you ever left an infant alone in the house while you went to the store?” Explain how response bias might operate with this question.
7
You want to survey a itudes of resident students toward the cafeteria food. (There are 2000 resident students, and about 1500 of them eat in the
8 cafeteria on a given day. The dorms have two students per room.)
How would you construct a random sample of size 50? a systematic sample? a cluster sample? Which of these is the best balance between
statistical purity and practicality?
Two studies — Misinformation and the 2010 Election (Ramsay 2010 [see “Sources Used” at end of book]) at the University of Maryland, and Some
9 News Leaves People Knowing Less (Fairleigh Dickinson University 2011 [see “Sources Used” at end of book]) have shown that Fox News viewers
know less about the world than people who watch no news at all. Can you conclude that this is because they watch Fox News? Why or why not?
You’re conducting a survey to determine Tompkins County voters’ willingness to pay for expanded bus routes. You randomly select twenty
10 bus trips on each day one week, and on each selected bus you or your associate hand a questionnaire to each person who gets on the bus.
(a) What is the most serious problem with this survey technique?
(b) What is the technical term for this type of mistake?
Sandy said, “I took a random sample by walking around the halls at lunch time and just asking random people to take my survey.” What is
11 wrong with this statement? What type of sample did Sandy actually take?
“42% of my sample said that they have at least one device in the house that can stream video.”
12 (a) What is the data type?
(b) Is this an example of descriptive or inferential statistics?
(c) Is the number 42% a statistic or a parameter?
You want to test the effectiveness of a new medication for a condition that was previously untreatable. You randomly select thirty doctors
13 from state lists of licensed doctors, and all of them agree to help.
Each doctor will put up notices in the waiting rooms, and will select the first 30 adult volunteers, assigning the first 15 to the experimental group
and the second 15 to the control group. Patients will not be told which group they are in; you supply placebo pills that are identical in appearance to
the active medication. Doctors will administer the placebo and medication to the selected groups and report results back to you.
Identify three serious errors in this technique. Are these examples of sampling or nonsampling error?
Which is larger, 0.0004 or 2.145E-4? Explain.

14
You survey 87 randomly selected households and find a total of 163 children. Dividing, you announce that the average number of children is
15 163/87 = 1.87356. What’s wrong with that, and how do you fix it?
Identify the type of each variable as discrete, continuous, or non-numeric:

16 (a) Telephone area code.
(b) Volume of a soap bubble.
(c) Number of times a comment gets retweeted.
(d) Ownership of a dog.
(e) Level of pain, from “none” to “unbearable”.
(f) Level of pain, from 0 to 10.
Here are some statements summarizing data. (I made all of them up.) State the original question asked or measurement taken from each
17 member of the sample, and identify each data type as discrete, continuous, or non-numeric. The first one is done for you as an example.
(a) The average weight loss in rats sent to space was 3.4 g.
Answer — Measurement: weight loss of each rat. Continuous.
Now you answer these:

(b) The average dinner check at my restaurant last Friday was $38.23.
(c) 45% of patients taking Effluvium complained of bloating and stomach pain.
(d) The average size of a party at my restaurant last Friday was 2.9 people.
2. How to Graph Your Data

Updated 29 Oct 2020
Summary: To make sense out of a mass of raw data, make a graph. Non-numeric data want a bar graph or pie chart; numeric data want a
histogram or stemplot. Histograms and bar graphs can show frequency or relative frequency.
Contents: 2A. Non-Numeric Data

2A1. Bar Graph
· Optional: Bar Graph in Excel
· Bar Graph with Relative Frequencies
· Optional: Relative Frequencies in Excel
· Side-by-Side Bar Graph
· Stacked Bar Graph
2A2. Making a Table from Scratch
2A3. Pie Chart
· Optional: Pie Chart in Excel
2B. Numeric Data
2B1. Histogram for Numeric Data
· Histogram Versus Bar Graph
· Relative-Frequency Histogram
· Optional: Histogram in Excel
2B2. Ungrouped Discrete Data
· Optional: Ungrouped Discrete Histogram in Excel
2B3. Shapes of Data Sets
2B4. Stem Plot
2C. Bad Graphs
Graph Paper Why buy an expensive pad of graph paper, especially if you only need a few sheets? You can print your own for free using
for Free: Incompetech’s Plain Graph Paper PDF Generator [URL h p://incompetech.com/graphpaper/plain/ accessed 2017-01-19] and at Math
Worksheets Land’s Graph Paper [URL h p://www.mathworksheetsland.com/topics/graphing/paper.html accessed 2017-01-19]. Both
are sources not just for the ordinary square grid, but for various specialty graph papers.
2A. Non-Numeric Data
Any graph of non-numeric data needs to show two things: the categories and the size of each. Probably you’re already familiar with the two most
common types, which are the bar graph and pie chart.
The sizes of categories can be shown as raw counts, called frequencies, or percentages, called relative frequencies. (Relative frequencies can also be
shown as decimals, but I think most people respond be er to “20%” than “.20”.)
How do you decide whether to show frequencies or relative frequencies? This is a stylistic choice, not a ma er of right and wrong. Your choice
depends on what’s important, what point you’re trying to make. If your main concern is just with the individuals in your sample, go with
frequencies. But if you want to show the relationship of the parts to the whole, show relative frequencies.
2A1. Bar Graph
Example 1: In fall 2012, the Pew Research Center (2013a) [see “Sources Used” at end
How Often Parents Read to Children under Age 12 (n=434)
of book] surveyed American adults on their habits of reading to their children. The
survey included 434 adults who had at least one child under age 12, and the results How Often Number of Parents
are shown in the table.
(Remember, you can’t call the data numeric just because you see numbers in a Every day 217
summary statement. You have to go back to the individual data points, which are A few times a week 113
categorical: “every day”, “a few times a week”, and so on. If the Pew Center had
asked “how many days a week do you read to your child?” and got answers 0, 1, 2, 3, About once a week 39
4, 5, 6, and 7, that would be a set of numeric data.)
A few times a month 26
Your bar chart or bar graph must follow these rules: Less often 30
The bars have equal width and equal spacing; they do not touch. Each bar is
labeled with its category below the axis. Never 9
Typically for non-numeric data, there’s no One True Order for the
categories. Try to find an order that feels natural. If you prefer, you can order the categories from the tallest to the shortest bar; that is called
a Pareto chart.
The frequency or relative-frequency axis (usually the vertical axis) starts at 0, and you need to show the 0 label. That axis is a number line, so
tick marks are equally spaced and represent consistent numbering. (The frequency 0 goes next to the horizontal axis. Don’t offset it
downward.)
The height or length of each bar is proportional to the number or percentage of individuals in that category. You can write frequencies or
percentages at the top of every bar, but this is optional because you’re labeling your frequency axis.
The frequency axis always needs a title. The category axis may or may not need a title, depending on whether the graph title and category
names make the chart easy enough to understand.
Usually the category axis is horizontal, so the frequency axis and the bars are vertical. But you can also make a horizontal bar chart, where the
category axis is vertical and the frequency axis and bars are horizontal.
You can make a bar graph by hand, or use software such as Microsoft Excel. If you make a bar graph by hand, use graph paper and draw the axes
and bars with a straightedge — wobbly bars make you look like you had a liquid lunch.
Here’s my bar graph for parents reading to children:
A couple of comments on best practices:

Notice that I made one square on the vertical axis equal 10 people, or five squares equal 50. That way when I have numbers like 113 or 39 I
know how high to draw my bars. If you pick three or four squares per 50 people, you have a much harder job to draw the bars at the correct
heights because you have to figure things like “if 50 is 3 squares, then 113 must be 113/(50/3) = about 6.8 squares.” Always pick “nice”
numbers for your numeric scales.
Notice also that I drew horizontal lines at the major milestones. These “gridlines” help the reader assess the heights of the bars more
accurately.
Optional: Bar Graph in Excel
Ge ing some kind of bar graph out of Excel is easy. But then there’s a lot of fiddling around to reverse some of Excel’s rather strange format choices.
Here are instructions for Excel 2010. If you have Excel 2007, 2013, or 2016, you’ll find that they’re pre y similar.
1. Get your categories into one column and your frequencies into the next column. The first
row of each column should be the column headings from the table. Don’t enter a total
row.
2. With your mouse, highlight all rows and columns of your chart. (It doesn’t ma er
whether you include the column heads.) Click the Insert tab and then Charts » Column,
and select the first 2-D column chart.
3. Right-click the useless legend at the right, “Series1” or “Number of Parents”, and select Delete.
4. When you right-clicked the legend, three Chart Tools tabs appeared. On the Layout tab of the ribbon click Chart Title » Above Chart. Click into
the chart title and type a be er one. (Maybe Excel already gave your chart a title, but “Number of Parents” is the proper title of the
frequency axis, not the whole chart.)
5. Click Axis Titles » Primary Vertical Axis » Rotated Title. Click on the words “Axis Title” that appear in the chart, and type the new title
“Number of Parents” for your frequency axis.
6. If your category axis needs a title, click Axis Titles » Primary Horizontal Axis » Title Below Axis and enter the axis title.
7. For some reason, the chart has tick marks between the categories. Right-click one of them, select Format Axis, and change Major tick mark type
to None. That gives the chart you see here.
8. You may have to tweak the forma ing of the graph further; here are some suggestions. (If you try something and don’t like the result, press
Ctrl-Z to undo the change.)
If the category names are long, try shifting them to vertical alignment: right-click on any of them and select Format Axis » Alignment. In
Text direction, select Rotate 90°.
You may need to resize the whole graph to improve spacing or to make the bars’ heights show be er contrast. Look carefully at the
frame and you’ll see handles in each corner and the middle of each side. To resize the graph, drag any of the handles.
To change fonts of the axis labels or titles, click the element, then click the Home tab in the ribbon. Change font or font size as desired.
If you prefer a horizontal bar chart, it’s easy to make the change. Click into the chart area, then on the Design tab on the ribbon click
Change Chart Type » Bar and select the first one.
Okay, well, nothing is that easy! Excel puts the categories in backwards order, so right-click the category axis and select Format Axis »
Axis Options » Categories in reverse order. Still on the Axis Options dialog, click Horizontal axis crosses at maximum category.
Bar Graph with Relative Frequencies
The frequency bar graph tells us about the 434 individuals in the Pew Research Center’s sample. But why collect that sample except for what it can
tell us about how often parents in general read to their children?
You know from Sampling Error in Chapter 1 that the proportions in the population are probably not the same as the sample, but probably not
very far off either. So you compute those proportions and then redraw your graph to show percentages instead of raw counts.
First, total all the frequencies to get the sample size n = 434. (In this case n is given How Often Parents Read to Children under Age 12 (n=434)
already, but often it isn’t.) Then convert each frequency into a relative frequency. The
formula, if you need one, is f/n. For example, 9 parents never read to their under-12 How Often Number of Parents Rel. Freq.
children. The relative frequency is f/n = 9/434 = 0.021 or 2%: 2% of parents never read
Every day 217 50%
to their children. Enter that and the other relative frequencies in the table, as shown
at right. A few times a week 113 26%
You may see some bar graphs with relative frequencies as decimals. There’s
nothing wrong with that for technical audiences, but general audiences usually About once a week 39 9%
respond be er to percentages. A few times a month 26 6%
Your relative frequencies may not add up to exactly 100% (or 1.0000), because of
rounding. Don’t change any of the numbers to force a total. Less often 30 7%
Once you have your relative frequencies, you can make your bar graph. Choose Never 9 2%
round numbers for the tick marks on your relative frequency axis, for example every
5% or every 10%. I won’t inflict another of my sketches on you, but you can see a finished relative-frequency bar graph below.
Optional: Relative Frequencies in Excel
To my surprise, I found that Excel doesn’t include relative-frequency bar graphs in its repertoire. You have to enter some formulas to compute the
relative frequencies, and then create the graph from them. (Of course you could compute the relative frequencies yourself and enter them in Excel as
numbers, but whenever possible I like to be lazy and make the computer do the work.)
1. Enter the categories in a column, leave a blank column, and enter the frequencies. If you
already have the categories and frequencies in adjacent columns, right-click on the le er at
the top of the frequency column and select Insert.
2. Click into the cell below the last frequency, and type “=sum(” (without the quotes). Then
with your mouse select the frequencies. Finally, type a closing parenthesis and hit the
Enter key.
3. In the address box just above the first column of the spreadsheet, type a unique name such
as TOTPARENTS and press the Enter key. This makes it easier to refer to this total cell.
4. Click into the empty relative-frequency cell for the first category. Type an = sign, then click
on the first frequency cell. (In the illustration, the relative-frequency cells are in column B
and the frequency cells are in column C.) Type /TOTPARENTS (including / mark for
division) and press the Enter key.
5. Grab the “handle” at the lower right of the cell you just typed into, and drag it down to fill
the Relative Frequency column.
6. Click the % sign in the ribbon to change the decimals to percentages. (The % sign is near the
middle of the ribbon, on the Home tab.)
Now highlight the category and relative-frequency columns, click the Insert tab and the
first 2-D column chart, and tweak the graph as you did before. Your result should be
something like the one you see here.
On this chart, neither axis really needs a label. The percent signs reinforce the
message in the chart title that the bars show relative frequencies. And the category
names together with the chart title tell the reader exactly what is being represented.
It’s a judgment call where to place tick marks on the relative-frequency axis, and
you really need to look at the data to make a decision. Four categories are under 10%, so
it makes sense to show the 5% line and help the reader get a sense of the relative sizes.
Of course, if you show 5% then you have to show every 5% increment up to the top of
the graph.
Side-by-Side Bar Graph
You may want to compare two populations: men and women, for instance, or one year versus another year. To do this, a side-by-side bar graph is
ideal. A side-by-side bar graph has two bars for each category, and a legend shows the meaning of the bars.
The two populations you’re comparing are almost never the same size. Therefore side-by-side graphs almost always show relative frequencies
rather than frequencies.
Example 2: In Educational A ainment, the Census Bureau (2014) [see “Sources Used” at end of book] showed the educational a ainment of the
population in selected years 1940 to 2012. I chose the years 1992 and 2012 and prepared this graph to show the change over that 20-year period.
What do you see? Comparing 2012 to 1992, the proportion of the population with no college (the first four categories) declined, and the proportion
with some college or a college degree increased. You should be able to see why this has to be a relative-frequency chart: in a frequency chart, the
larger population in 2012 would make all the bars taller than the 1992 bars, and you’d be hard put to see any kind of trend.
Stacked Bar Graph
Example 3: Another way to compare two populations is the stacked bar graph. In the side-by-side bar graph, above, each group of bars was one
category, and each bar within a group was a population. With the stacked bar graph, you have one bar for each population, and one piece of that bar
for each category. (A stacked bar graph is kind of like an unrolled pie chart.)
Here’s a stacked bar graph for the same data set:
What do you see? Look first at the legend that lists the categories, then at the two bars. The top two
segments represent some college. In 1992, about 56% of adults had no education beyond high school. Because this textbook helps you,
But in 2012, only about 42% had a high-school diploma or less, meaning that 58% had at least some please donate at
college. The proportions of college and no college were reversed in those 20 years. BrownMath.com/donate.
You can also see that, though the group with four years of high school shrank, it didn’t shrink as
much as the group with college grew. In other words, it’s not just more high-school graduates going
on to college, it’s a higher proportion of the population entering high school. All the categories without a high-school diploma shrank. In 1992, 20% of
adults had less than a high-school diploma and 80% were high-school graduates; in 2012, only about 12% had less than a high-school diploma and
88% had graduated from high school.
What’s the best way to compare two populations? The answer depends on what you’re trying to show. The side-by-side graph seems to be be er at
showing how each category changed, and the stacked graph is usually be er at showing the mix, especially if you want to group the categories
mentally. In the side-by-side graph, you can easily see the decline in adults with a fourth-grade education or less, but the shift to a college-educated
population is much harder to see. It’s just the opposite with the stacked graph.
As always, get clear in your own mind what you’re trying to show, and then select the type of graph that shows that most clearly.
Did you notice that this stacked bar graph shows relative frequencies? (Maybe you didn’t notice, because it seems like the natural way to go.) A
stacked bar graph could show frequencies instead of relative frequencies, if you want to emphasize the different sizes of the populations, but then it
becomes harder to compare the mix in the populations.
BTW: When you make a stacked bar graph in Excel, there’s no need to pre-compute the percentages. Just select the third type of 2-D column chart, 100% Stacked Column.
2A2. Making a Table from Scratch

Example 4: In the first example, you were given a table of categories and counts. But more likely you’ll just have a mass of data points, like this:
Children’s Favorite Beach Toys

shovel dump truck shovel bucket shovel ball
ball bucket sifter ball shovel shovel
dump truck ball shovel shovel bucket net
sifter shovel bucket dump truck bucket shovel
ball shovel ball bucket net ball
Before you can make any kind of graph, you need a table to summarize the data. You’re probably tempted to count the number of shovels, the
number of balls, and so on, but it’s way too easy to make mistakes that way. Why? Because you have to go over the data set multiple times, and you
may count something twice or miss something.
The be er procedure is to tally the categories in a table. It’s a win-win: the procedure is faster, and you’re less likely to make a mistake.
Simply go through the data, one item at a time. If you’re seeing a given category for the first time, add it to your list with Toy Tallies
a tally mark; if that category is already in your table, just add a tally mark. Here’s my table of tallies after going through the
shovel |||
first two columns of data:
ball |||
Please complete your tallies on your own before you look at mine.
After you’ve tallied all the data, count the tallies in each category and total the counts. Of course the total should equal dump truck ||
your sample size n. Here’s my complete table: sifter |
bucket |
Toy Tallies Frequency
shovel |||| |||| 10
ball |||| || 7
dump truck ||| 3
sifter || 2
bucket |||| | 6
net || 2
Total 30
Always check the total of your frequencies. If it matches the sample size, that’s no guarantee everything is correct; but if it doesn’t match, you know
something is wrong.
Once you’ve got your table, you can make a graph by following the procedures above. If you’re publishing the table itself, give just the category
names and sizes and the total, but leave out the tallies.
2A3. Pie Chart
Where a bar graph tends to emphasize the sizes of categories in relation to each other, a pie chart tends to emphasize the categories as divisions of
the whole. This distinction is not hard and fast; it’s just a ma er of emphasis.
To make a pie chart, you need a compass, or something else that can draw a circle, and you need a protractor. The angle of each segment of the pie
will be 360°×f/n, where f is the frequency of the category and n is the sample size — in other words, it’s 360° times the relative frequency, whether
you’re showing frequencies or relative frequencies on the pie chart. But in practice, if you’re going to make a pie chart you’ll use Excel or some other
software.
Optional: Pie Chart in Excel
Excel can draw a pie chart for you, but you have to make a bunch of tweaks before it’s usable. There’s one bit of good news: with a pie chart, unlike a
bar graph, Excel can compute relative frequencies automatically. I’ll show you how to do that for the data about parents reading to children, for
which we made a bar graph earlier.
1. Highlight the categories and frequencies, but not the total. Click the Insert tab and then
Pie, and choose the first 2-D pie. You see the result at right.
Many people stop there, but this is an absolutely horrible design. Readers have to
keep looking back and forth to match up the colors, and often there are similar colors.
Color-blind people are really screwed, and if you print the chart on a black-and-white
printer it’s hopeless. Fortunately you can fix this!
2. You’re going to put the category names with the pie segments, so right-click the legend
(the list of categories at the right) and select Delete.
3. Click on the “Number of Parents” title and type in a be er one, such as “How Often
Parents Read to Children”. (Don’t type the quotes in the title, of course.)
4. In the ribbon, on the Layout tab, click Data Labels » More Data Label Options. Under
Label Contains, select Category Name, select either Value or Percentage, and select Show Leader Lines. Under Label Position, select Best Fit. Click
Close.
5. You may want to resize the graph to make the labels less crowded, depending on the sizes of the segments. Drag a handle with your mouse,
as you did before.
2B. Numeric Data
Summary: For numeric data, you want to show four things: the shape, center, and spread of the distribution plus any outliers. The histogram is
the standard way to do this, and it can show frequencies or relative frequencies.
Usually you’ll group the data into classes, but when you have discrete data without too many different values you can make an
ungrouped histogram.
For a discrete data set with a moderate number of values and a moderate range, a stemplot is an alternative. With a stemplot, it
doesn’t ma er how many different data values there are, but the number of data points ma ers.
2B1. Histogram for Numeric Data
How can you draw a picture of numeric data? The answer is a histogram.
BTW: The term “histogram” was coined by Karl Pearson [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_PearsonK] in lectures some time before 1895.
Example 5: Let’s use the lengths of some randomly selected iTunes songs:
Lengths of iTunes Songs (seconds)

113282179594213 319245323334526
395440477240296 428407230294152
242837246135412 223275409114604
170239138505316 369298168269398
433212367255218 283179374204227
How do you make sense of this? As you might expect, the first step is to make a table. But you don’t want to treat each number as its own category,
because that would produce a really uninteresting graph. Instead you create categories, except for numeric data you call them classes. The rules for
classes are very simple:
The classes must cover all the data points.
They must all be the same width.
There must be no gaps between classes.
Notice that the rules don’t tell you how many classes there must be, or what width a class must have. That’s where your discretion comes in. You
want to pick class boundaries that are “nice” numbers, and you don’t want too many classes or too few. In practice, five to nine classes is usually
about the right number.
How does that apply to the iTunes songs? Take a look at the data. The lowest number seems to be 113, and the highest is 837. That gives a range in
“nice” numbers of about 100–850. If you set class width to 100 you have eight classes, so that seems about right.
Now go ahead and make your tally marks to create the table. Instead of category names, you use class boundaries. You already know how to make
tally marks, so I’ll just give you the results:
Lengths of iTunes Songs (seconds)

Class Tallies Frequency
Boundaries
100–199 |||| |||| 9
200–299 |||| |||| |||| |||| 20
300–399 |||| |||| 9
400–499 |||| || 7
500–599 ||| 3
600–699 | 1
700–799 0
800–899 | 1
Even though the 700–799 class has no data points, it’s still a class and it will occupy the same width in the histogram as any other class. A bar with
zero height shows in the histogram as a gap, and that’s good because it emphasizes that there’s something unusual about the point in the 800–899
class (which was 837 seconds).
If the class width is 100, how come the class bounds are 100–199 and not 100–200? In fact, some authors do write these class bounds as 100–200, 200–
300, and so on, with the understanding that if a number is right on the boundary it goes in the upper class. All authors agree that the class width is
the difference between the lower bounds of two consecutive classes, not the difference between lower and upper bounds of one class. So whether
you write 100–199 for the first class or 100–200, the class width is 200 minus 100, which is 100.
Once you have the table, the histogram is straightforward. You can draw the histogram by hand or use Excel. I’ll show Excel later, but here’s my
hand-made histogram for the iTunes data.
Notice that you label the data bars on their edges: 100, 200, …, 900, not 100–199, 200–299, …. Label the left edge of each bar, and also the right
edge of the last bar. The right edge of the last bar is always one class width more than the left edge, so even if you’ve got 800–899 in your table the last
bar’s edges are 800 and 900.
Like all histograms, this one is good at showing the shape of the data (skewed right; see
below), the center (somewhere in the upper 200s to 300s), and the spread (from 100ish to
800ish seconds, or about two minutes to 13 minutes). In Chapter 3 you’ll learn how to
measure center and spread numerically, but there’s always a place for a picture to help
people grasp a data set as a whole.
This data set also shows an outlier, located somewhere in the 800–899 class. Not every
data set will have an outlier, of course, and a rare sample might have more than one. When
an outlier occurs, your first move is to go back to your original data sheets and make sure
that it’s not simply a mistake in entering your data. If it’s a real data point, then you can
ask what it means. In this case, the message is pre y simple: tunes generally run up to
about 11 or 12 minutes (700 seconds), but the occasional one can be several minutes longer.
Histogram Versus Bar Graph
A histogram is similar to a bar graph, but with the following differences:
Histogram Bar Graph
Data type Numeric (grouped) Discrete ungrouped★ Non-numeric
Order of categories Numeric order, Numeric order, Any order you choose
left to right left to right
Do the bars touch? Yes No, they’re spaced
Where are they labeled? Below the edges Below the centers
★Some authors treat ungrouped discrete data as numeric and make a histogram. Others, including this book, treat ungrouped discrete data as
categories [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c02_UngroupedHisto] and make a bar graph.
For both histogram and bar graph, the frequencies must start at 0. However, in a histogram the data axis typically doesn’t start at zero. You just leave
some space between the frequency axis and the first bar, and the scale of the data axis is considered to start at the first bar.
Relative-Frequency Histogram
Though I don’t show it here, you could make a relative-frequency histogram, the same way you made a relative-frequency bar chart. The relative
frequencies range from 0 for the 700–799 class to 20/50 = 40% for the 200–299 class.
Optional: Histogram in Excel
Believe it or not, out of all the chart types in Excel, the standard histogram was not included until Excel 2016. If you’ve got Excel 2016, click Insert »
Recommended Charts and select the histogram.
In Excel 2013 and earlier, to make a histogram you have to combine a column chart and a sca erplot (Middleton) [see “Sources Used” at end of
book], or download additional software. You can follow the detailed instructions in that document, or you can download the free Be er Histogram
add-in [URL h p://www.treeplan.com/download-free-be er-histogram-add-in.htm accessed 2014-09-07] from TreePlan Software to do the job. (It
works in Excel 2007 through 2016.) If you’re using Be er Histogram:
1. Enter all the original numbers in a column in Excel.
2. Double-click the downloaded ZIP file, and within it double-click Be er-Histogram-2007. You will have to enable macros.
3. Click the Add-Ins tab in the ribbon, and then Be er Histogram.
Data Range: Click the “_” bu on at right and highlight your numbers.
Start Value: The lower bound of your first class, not the lowest number in the data.
Step Value: Your class width.
Stop Value: The right-hand edge of the last class, which in this example is 900 (not 899).
4. Be er Histogram will create a new sheet in your workbook with a frequency table and histogram. Click on the chart title and enter a new
title. Click on the horizontal axis title and either delete it or change it to more appropriate text. The result is shown at right.
5. Optional: You might wish to jazz up the chart visually. If so, click on the Design tab of
Excel’s ribbon and choose a design. Color is fine, but don’t choose different colors for
different bars because that can make bars look larger or smaller than they actually are.
Here’s what I got from clicking the blue theme.
2B2. Ungrouped Discrete Data
To make sense of most data sets, you need to group the data into classes. But sometimes your data have only a few different values. In such cases,
you probably want to skip the grouping and just have one histogram bar for each different response. The height of the bar tells you how often that
response occurred, as usual.
Example 6: A state park collected data on the number of adults in each vehicle that entered the park in a given time interval:
3113330731 3645323423
0224833133 3415226342
There are only nine different values, so it seems a li le silly to group them. Instead, just tally the Number of Adults
occurrences, as shown at right. in Vehicles Entering Park
Adults Tallies Frequency
Label ungrouped data under the centers of the bars, just like categorical data, not under the edges.
0 || 2
Some authors still make the bars touch because the data are numeric, and others keep the bars
separated because the data are ungrouped. I prefer the second approach, but I’ll accept the other. Here’s 1 |||| 5
my histogram: 2 |||| || 7
3 |||| |||| |||| 15
4 |||| 5
5 || 2
6 || 2
7 | 1
8 | 1
Total 40
Caution: This particular data set has at least one occurrence of every value between min and max. But suppose it didn’t; suppose there were no
vehicles with 7 adults? In that case, you would draw the histogram exactly the same, except that the bar above “7” would have zero height. The
horizontal axis for numeric data must always have a consistent scale for its whole length, so you never close up any gaps.
Optional: Ungrouped Discrete Histogram in Excel
You can graph ungrouped discrete data in Excel, if you wish. The key is to fool Excel into treating the data like categorical data:
1. Type the unique values in one column. But as you type each number, type an apostrophe (') first. Don’t put 0, 1, 2 and so on in the cells, but
'0, '1, '2. The apostrophe won’t appear, but it tells Excel to treat the numbers like text. (You may notice that Excel
left justifies those numbers.)
2. Type the frequencies in a second column.
3. Highlight the numbers in both columns, and on the Insert tab click Column. Select the first 2-D column.
4. Make all the same adjustments you made for the bar graph, above.
By the way, you might notice that the tick marks on the vertical axis are every two
cars on this graph, but they were every five cars on my hand-drawn histogram. One is
not be er than the other; it’s a stylistic choice.
5. Optional: If you want to make the bars touch, right-click on the graph, select
Format Data Series, and under Series Options change Gap Width to 0%. Then click
Border Color and select Solid Line with a color of white.
2B3. Shapes of Data Sets
You should know the names of the most common shapes of numeric data. Why? It’s easier to talk about data that way, and — as you’ll see in the next
chapter — you treat different-shaped distributions a li le differently.
The first question is whether the data set is symmetric or skewed. The histogram of a symmetric data set would look pre y much the same in a
mirror; a skewed data set’s histogram would look quite different in a mirror.
If a distribution is skewed, you say whether it’s skewed left or skewed right. A distribution that is skewed left, like the first one below, has mostly
high scores, and a distribution that is skewed right, like the second one below, has mostly low scores. The direction of skew is away from the bulk
of the data, toward the long skinny tail, where there are few data points.
Skewed left or Skewed right or

negatively skewed positively skewed
Example 7: Scores on a really easy test would be skewed left: most people get high scores, but a few get low or very low scores.
Lifespan in developed countries is skewed left: there are relatively few infant and child deaths, and most people live into their 60s, 70s, or 80s.
(The first graph in Calculus Applied to Probability and Statistics [Waner and Costenoble 1996] [see “Sources Used” at end of book] illustrates this.)
People’s own evaluation of their driving skills and safety are left skewed: few people rate themselves below average and most rate themselves
above average. Illusory Superiority [see “Sources Used” at end of book] cites a study by Svenson showing this “Lake Wobegon effect”.
Example 8: People’s departure times after a concert would be skewed right: most people leave shortly before or after the performers finish, but a few
straggle out for some time afterward. Skewed-right distributions are more common than skewed-left distributions.
Salaries at almost any corporation are another good example of a distribution that is skewed right: most people make a modest wage, but a few
top people make much more.
There are several types of symmetric distributions, but here are the two you’ll meet most often. A uniform distribution is one where all possible
values are equally likely to occur. The normal distribution has a precise definition, which you’ll meet in Chapter 7, but for now it’s enough to say
that it’s the famous bell curve, with the middle values occurring the most often and the extreme values occurring much less often.
You’ll notice that both of the examples below are “bumpy”. That’s usual. In real life you pre y much never meet an exact match for any
distribution, because there are always lurking variables, measurement errors, and so on. And even if a population does perfectly follow a given
distribution, like the probability distributions you’ll meet in Chapter 6, still a sample doesn’t perfectly reflect the population it came from: sampling
error is always with us. When we say that a data set follows such-and-such a distribution, we mean it’s a close match, not a perfect match.
Uniform Normal (“bell curve”)
Example 9: Winning lo ery numbers are uniformly distributed. (In the short term some numbers occur more often than others, but over the long run
they tend to even out.)
The results of rolling one die many times are uniformly distributed. (But the results of rolling two dice are not uniformly distributed: 7 is the
most likely, 2 and 12 are tied for least likely, and the other numbers are intermediate.)
The normal distribution or bell curve occurs very often, and in fact many natural and industrial processes produce normal distributions. This
happens so often that we often just say or write ND for “normal distribution” or “normally distributed”.
Example 10: Men’s and women’s heights follow separate normal distributions. People’s arrival times at an event are ND. IQ scores, and scores on
most tests, are ND. The amount of soda in two-liter bo les is ND. Your commute times on a given route are ND.
2B4. Stem Plot
Suppose you have a discrete data set with few repetitions. An ungrouped histogram would have most bars at the same low height; a grouped
histogram might show a pa ern but you’d lose the individual data points.
If your discrete data set isn’t too large (n < 100, give or take), and the range isn’t too great, you can eat your cake and have it too. The stemplot,
also known as a stem-and-leaf diagram, is a mutant hybrid between a histogram and a simple list of data.
The idea is that you take all the digits of each data point except the last digit and call that the stem; the last digit is the leaf. For example, consider
scores of 113 and 117. They are two leaves 3 and 7 on a common stem 11 (meaning 110).
To construct a stemplot, you look over your data set for the minimum and maximum, then write the stems in a column, from lowest to highest.
Just like with a histogram, there are no gaps, so if you have data in the 50s and the 70s but not in the 60s you still need a stem of 6.
However, your stems probably won’t start at 0. Start them with the lowest data point that actually occurred, and end them with the highest data
point that actually occurred.
BTW: The stemplot was invented by John Tukey [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Tukey] in 1970.
Example 11: Here is a set of IQ scores from 50 randomly-selected tenth graders:
99 77 83 111 141 89 98 84 93 124
110 73 96 60 102 87 123 120 100 95
100 90 104 85 129 81 119 112 103 76
108 91 94 114 108 92 96 94 88 101
117 106 103 105 113 97 106 109 80 116
To make your stemplot, eyeball the data for the minimum and maximum, which are 60 and 141. Write the stems, 6 to 14, in a column at the left of
your paper, starting several lines below the top. Then draw a vertical line just to the right of them.
Now go through the data points, one by one, and add each leaf to the proper stem. During this process, you might find a value outside what you
thought were the min and max. That’s no problem. Just add the stem and then the leaf. (Again, the stems can’t have gaps, so if your first stem is 6 and
you come across a data point 47, you have to add stems 4 and 5, not just 4.)
Finally, add a title and a legend or key to your stemplot. Here is the result:
IQ Scores
6 | 0
7 | 7 3 6
8 | 3 9 4 7 5 1 8 0
9 | 9 8 3 6 5 0 1 4 2 6 4 7
10 | 2 0 0 4 3 8 8 1 6 3 5 6 9
11 | 1 0 9 2 4 7 3 6
12 | 4 3 0 9
13 |
14 | 1
key: 11 | 7 = 117
If you lie down and look at this sideways, it looks like a histogram. But the bonus is that you can still see all the actual data points within the
groupings of 60–69, 70–79, etc.
A stemplot is great at showing shape, center and spread of distributions plus outliers, but most data sets don’t lend themselves to a stemplot. If
your data set is too large, your leaves will run off the edge of the page. If your data set is too sparse — if the range is large for the number of data
points — most of your stems won’t have leaves and the plot won’t really show any pa erns in the data. But when you have a moderate-sized data
set and the data range is moderate, a stemplot is probably be er than a histogram because the stemplot gives more information.
One last touch is sorting the leaves. I don’t think that’s important enough to take the extra effort in a homework problem or on a quiz, but if you’re
going to be presenting your stemplot to other people then you probably want to sort the leaves. Here’s the same stemplot with sorted leaves:
IQ Scores
6 | 0
7 | 3 6 7
8 | 0 1 3 4 5 7 8 9
9 | 0 1 2 3 4 4 5 6 6 7 8 9
10 | 0 0 1 2 3 3 4 5 6 6 8 8 9
11 | 0 1 2 3 4 6 7 9
12 | 0 3 4 9
13 |
14 | 1
key: 11 | 7 = 117
A glance at this stemplot shows you quite a lot. The data set is normally distributed, the center is around 100 points, the spread is 60–141, and there’s
an outlier at 141.
2C. Bad Graphs
You now know how to make good graphs, so be on the lookout for bad graphs. Sometimes they’re bad just because whoever drew them didn’t know
any be er, or didn’t think. But some people may deliberately try to deceive you with a graph.
Example 12: File this one under “what were they thinking?” The left-hand graph doesn’t have a title, so you don’t know what “Yes” and “No” mean.
You have to look back and forth between the graph and the legend, and anyone with red-green color blindness probably won’t be able to see which
segment is which. Oh yes — what percentages of the sample answered “Yes” and “No”? You can guess that it’s around a third versus two thirds, but
that’s not very precise.
The right-hand graph cures those problems. It’s now crystal clear which segment is Yes and which is No, and what proportion of the sample
gave each answer. This actually lets you show more information in less space, a win-win. (Of course you wouldn’t use a vague term like
“Opinions” — that’s just there to remind you to give your graph a title.)
Example 13: There’s no telling whether this one is deliberate deception or just incompetent graphing. An oatmeal company, which shall remain
nameless, wanted to show that eating oatmeal for four weeks reduces cholesterol. The first graph makes a strong case — until you look at the scale on
the vertical axis. (Don’t even think about wasting your time on a graph with no vertical scale.)
The scale doesn’t start at zero, so it makes differences look much bigger than they are. Your frequency or relative frequency scale must always
start at zero (and you must show the zero). The second graph is properly drawn, and now you can see that the drop in cholesterol is only a slight
one.
Example 14: It’s all very well to create visual interest, but not if it makes the reader misinterpret the graph.
In the left-hand graph, you can tell from the scale that B is supposed to be three times as large as A, but since it’s three times as high and three
times as wide it’s actually nine times as large, giving the reader a distorted impression of the amount of difference. Even if your “bars” are pictures,
they still have to be the same width. The corrected version is shown at right. (It’s still not quite correct, though, because 0 is not shown on the vertical
axis.)
source: Misleading Graph [see “Sources Used” at end of book]
If you follow the rules in this chapter, you’ll make good, professional graphs. But there are plenty of other ways to make good graphs, depending on
the data you’re trying to show.
There’s a classic picture book that can give you lots of good ideas. Edward Tufte’s The Visual Display of Quantitative Information has been around since
1983, and no one has yet done it any be er. (Tufte has produced newer editions.)
Example 15: One famous graph in Tufte’s book is particularly stunning. Charles Minard wanted to present a lot of time-series data about Napoleon’s
disastrous campaign in Russia in the winter of 1812–1813: where ba les took place, numbers of casualties, temperature, and so forth. He elected to
make a kind of stylized map showing just the rivers and the cities where events happened. (Niemen at the left is the Niemen River, Russia’s western
border at the time. Moscow, “Moscou” in French, is as far east as Napoleon got.) Across that, Minard showed the army strength as a broad swath at
the start that shrank to almost nothing by the end of the retreat westward. Below are dates of events, temperatures, and precipitation. It’s a huge
amount of information on one piece of paper.
This tiny rendition doesn’t do it justice, but if you visit h p://upload.wikimedia.org/wikipedia/commons/2/29/Minard.png you’ll see it at a be er size.
(Your browser may still reduce it to fit on your screen. Try clicking into the picture and you should see it at original size, though you’ll have to scroll
around to see the details. It sounds like a lot of effort, but I promise you it’s worth it. Or just get the book, because it has plenty more!)
Example 16: Here’s one I ran across in my reading. It’s not the graph of the century like Minard’s, but it’s a cut above the usual. In Bear A acks: Their
Causes and Avoidance (2002), Stephen Herrero had the problem of contrasting bears’ diet in spring, summer, and fall. (Of course in winter they’re not
eating.)
He could have drawn three pie charts, or a stacked bar graph, but instead he came up with a great alternative. (A larger version is at
h ps://BrownMath.com/swtpic/chap02_beardiet.jpg.) Each component of diet is clearly labeled right in the graph, not in some legend off to the side,
and the contrasting backgrounds make it a li le more interesting visually. A stacked bar graph would convey the same information, but I like this
presentation because it suggests that “spring”, “summer”, and “fall” are not completely separate but rather transition one into the next.
The vertical axis is clearly labeled, too. There’s no doubt what the numbers are (as opposed to some units of weight, for instance, or something
more esoteric like pounds of feed per hundreds of pounds of bear).
He probably could have left off the title off the category axis — after all, we know that the seasons are seasons, and the graph title also conveys
that information. But that’s a minor point. My only real quibble with this graph is that the overall graph title at the bo om is too small.
Overview: With numeric data, the goal of descriptive stats is to show shape, center, spread, and outliers.
When you have a mass of data and need frequencies, don’t pass through the data repeatedly, counting a different category each
time. Instead, use the tally system.
Relative frequency for any class or category is the number of data points in that class, divided by total sample size.
For non-numeric data, make a bar graph or pie chart. Place categories in any order that seems reasonable to you. Side-by-side bar
graphs and stacked bar graphs can be useful for comparing populations.
Numeric data: Group continuous data in classes, tally them, and make a grouped histogram. Bars must touch, and you label them
under the edges, not the middles. Do the same with discrete data that have a lot of different values.
Present discrete data without too many different values in one bar for each different value. Label them under their middles. It’s a
ma er of taste whether the bars touch (ungrouped histogram) or not (bar graph).
For bar graphs and histograms, show scale on the frequency or relative-frequency axis, and show scale or category name on the data
axis. Usually, each axis has a title, with a separate chart title at the top. But you can omit an axis title when it would be redundant
information.
In every bar graph or histogram, the frequency or relative-frequency axis must start at 0 and have consistent scale for its whole
length.
Be on the lookout for violations of this rule and other signs of bad graphs.
Know the most common shapes of numeric distributions: uniform, bell curve, skewed left, and skewed right.
The stemplot (stem-and-leaf diagram) is also an option for discrete data with moderate range and ≤ about 100 data points.
Study aids: Histogram Versus Bar Graph

do it after all.
The Pew Research Center (2013c) [see “Sources Used” at end of book] conducted a poll of 1000 adults in
1 Mexico, asking whether they would move to the US if they had the means and opportunity to move. Draw
Would You Move to the US?
a relative-frequency bar graph for their responses. Yes, with authorization 154
Yes, without authorization 204
No 612
Don’t know 30
What’s wrong with this graph? (You should be able to see at least two problems, maybe more.)
2
(source: Misleading Graph [see “Sources Used” at end of book] in Wikipedia)
Professor Marvel had a statistics class of fifteen students, and on one 15-point quiz their scores were
3 10.5 13.5 8 12 11.3 9 9.5 5 15 2.5 10.5 7 11.5 10 10.5
Construct a frequency table and bar graph for their le er grades on the quiz, where 90% is the minimum for an A, 80% for a B, 70% for a C, and 60%
for a D.
Bulmer (1979, 92) [see “Sources Used” at end of book] quotes an 1898 study of deaths by horse kick in
4 the Prussian army. Von Bortkiewicz compiled the number of deaths in 14 Prussian Army corps over the
Deaths by Horse Kick in
14 Prussian Army Corps, 1875–1894
20-year period 1875–1894, as shown at right. (14 corps over 20 years gives 14×20 = 280 observations.) For
example, there were 32 observations in which two officers died of horse kicks. Number of Deaths Frequency
(a) What is the type of the variable?
(b) Construct an appropriate graph. 0 144
1 91
2 32
3 11
4 2
Total 280
In a GM factory in Brazil, 25 workers were asked their commuting distance in kilometers. Construct a stem-and- Commuting Distances in km
5 leaf plot. 5 15 23 12 9
—Adapted from Dabes and Janik (1999, 8) [see “Sources Used” at end of book] 12 22 26 31 21
11 19 16 45 12
8 26 18 17 1
16 24 15 20 17
Abigail asked a number of students their major. She found 35 in liberal arts, 10 in criminal justice, 25 in nursing, 45 in business, and 20 in other
6 majors. What was the relative frequency of the nursing group, rounded to the nearest whole percent?
(a) Name three types of graph used for ungrouped discrete data. Which type do you use when?
7 (b) Name the type of graph used for grouped numeric data.
(c) Name two types of graph used for qualitative data.
Bert asked his fellow students how many books they read for pleasure in a year. He found that most of them
8 read 0, 1, or 2 books, but some read 3 or more and a very few read as many as 10. (He plo ed the histogram
shown at right.) Identify the shape of this distribution.
(a) In making a histogram, how do you decide whether to group data?

9 (b) What are the two rules for classes when you group data?
At right is a grouped frequency distribution.
10 (a) Create a frequency histogram. (For a real quiz, you’d use graph paper, but you can freehand this one.)
Test scores, x Frequencies, f
(b) Find the class width. 470.0–479.9 15
(c) What’s the shape of this distribution?
480.0–489.9 22
490.0–499.9 29
500.0–509.9 50
510.0–519.9 38
3. Numbers about Numbers

Updated 1 Feb 2015
Summary: For numeric data, the goal of descriptive stats is to show the shape, center, spread, and outliers of a data set. In this chapter, you learn
how to find and interpret numbers that do that.
Measures of center: mean, median, mode. Median is resistant and therefore be er for describing skewed data.
Measures of spread: range, interquartile range, variance, standard deviation. Standard deviation is best.
Measures of position: percentiles, z-scores, quartiles. The quartiles help determine which data points, if any, are outliers.
The min, max, and quartiles appear in the five-number summary and are shown on a boxplot.
BTW: Measures of shape called skewness and kurtosis do exist, but they’re not part of this course. Roughly, skewness tells how this data set differs from a
symmetric distribution, and kurtosis tells how it differs from a normal distribution. If you’re interested, you can learn about them in Measures of Shape:
Skewness and Kurtosis [URL: h ps://BrownMath.com/stat/shape.htm]. The MATH200B Program part 1 can compute those measures of shape for you.
Contents: 3A. Measures of Center

3A1. The Three M’s: Mean, Median, Mode
3A2. Mean, Median, Mode, and the Shape of a Data Set
3B1. … from a List of Numbers
3B2. … from an Ungrouped Distribution
· Weighted Average
3B3. … from a Grouped Distribution
3C1. Range and IQR (Interquartile Range)
3C2. Standard Deviation
· What Good Is the Standard Deviation, Anyway?
· The Empirical Rule for Normal Distributions
· Optional: Chebyshev’s Inequality
3D1. Percentiles
3D2. Quartiles
3D3. z-Scores
3E1. Outliers
3E2. Box-Whisker Diagrams
· Box-Whisker Plot, and Shape of a Data Set
· Box-Whisker Plot on TI-83/84/89
· Finding Outliers with the TI-83/84/89 or Excel
· Five-Number Summary from TI-83/84/89 Boxplot
3A. Measures of Center
3A1. The Three M’s: Mean, Median, Mode
There are three common measures of the center of a data set: mean, median, and mode.
De inition: The mean is nothing more than the average that you’ve been computing since elementary school.
The symbol for the mean of a sample is x̅, pronounced “x bar”. The symbol for the mean of a population is the Greek le er µ,
pronounced “mew” and spelled “mu” in English. (Don’t write µ as “u”; the le er has a tail at the left.)
You can think of the mean as the center of gravity of the distribution. If you made a wooden cutout of the histogram, you could
balance it on a pencil or your finger placed exactly under the mean.
BTW: The formula for the mean is x̅ = ∑x/n or µ = ∑x/N, meaning that you add up all the numbers in the data set and then divide by sample size or population size.
De inition: The median is the middle number of a sample or population. It is the number that is above and below equal numbers of data points.
(Examples are below.)
There’s no one agreed symbol for the median. Different books use M or Med or just “median”.
To find the median by hand, you must put the numbers in order. If the data set has an odd number of data points, counting
duplicates, then the median is then the middle number. If the data set has an even number of data points, the median is half way
between the two middle numbers. (In the next section, you’ll get the median from your TI calculator, with no need to sort the
numbers.)
De inition: The mode is the number that occurs most frequently in a data set. If two or more numbers are tied for most frequent, some textbooks
say that the data set has no mode, and others say that those numbers are the modes (plural). We’ll follow the second convention.
Most distributions have only one mode, and we call them unimodal. If a distribution has two
modes, or even if it has two “frequency peaks” like the one at right, we call it bimodal. (This was
students’ final grades in a math course: a lot of low or high grades, and few in the middle.)
There’s no symbol for the mode.
Example 1: You’re interviewing at a company. You ask about the average salary, and the interviewer tells you that
it’s $100,000. That sounds pre y good to you. But when you start work, you find that everybody you work with is
making $10,000. What went wrong here?
The interviewer told the truth, but left out a key fact: Everybody but the president makes below the average. Eight employees make $10,000 each,
the vice president makes $50,000, and the president makes $870,000. Yes, the mean is (8×10,000 + 50,000 + 870,000)/10 = $100,000, but that’s not
representative because the president’s salary is an outlier. It pulls the mean away from the rest of the data, and skews the salary distribution toward
the right. This graph tells the sad tale:
There was your mistake. Salaries at most companies are strongly skewed right, so most employees make less than the average. When a data set is
skewed, the mean is pulled toward the extreme values. (A data set can be skewed without outliers, but when there are outliers the data set is almost
certain to be skewed.)
You should have asked for the median salary, not the average (mean) salary. There are 10 employees, and 50% of 10 is 5, so the median is less than or
equal to five data points and greater than or equal to five data points. The fifth-highest and sixth-highest salaries are both $10,000, so the median is
$10,000.
The median is more representative than the mean when a data set is skewed. The mean is pulled toward an extreme value, but the median is
unaffected by extreme values in the data set. We say that the median is resistant.
Example 2: What is the median of the data set 8, 15, 4, 1, 2? Put the numbers in order: 1, 2, 4, 8, 15. There are five numbers, and 50% of 5 is 2.5. You
need the number that is above 2 data points and below 2 data points; the median is 4.
Example 3: What is the median of the data set 7, 24, 15, 1, 7, 45? There are six data points, and in order they are 1, 7, 7, 15, 24, 45. 50% of 6 is 3; you
need the number that is above 3 data points and below 3 data points. It’s clear that the median is between 7 and 15, but where exactly? When the
sample size is an even number, the median is the average of the two middle numbers. Therefore the median for this data set is the average of 7 and
15, (7+15)/2 = 11.
3A2. Mean, Median, Mode, and the Shape of a Data Set
When a distribution is symmetric, the mean and median are close together. If it’s unimodal, the mode is close to the mean and median as well.
But have you ever taken a course that was graded on a curve, and one or two “curve wreckers” ruined things for everyone else? What happened?
Their high scores raised the class average (mean), so everybody else’s scores looked worse. The class scores were skewed right: low scores occurred
frequently, and high scores were rare. (You can see shapes of skewed distributions in Chapter 2 [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c02_Shapes].)
When a distribution is skewed, the mean is pulled toward the extreme values. The median is resistant, unaffected by extreme values. And you can
reverse that logic too: if the mean is greater than the median, it must be because the distribution is skewed right. From the median to the mean is the
direction of skew.
Skewed left, Skewed right,

mean < median (usually) mean > median (usually)
For heaven’s sake, don’t memorize that! Instead, just draw a skewed distribution and ask yourself approximately where the mean and median fall on
it.
BTW: Karl Pearson gives the rule median = (2×mean + mode)/3 for moderately skewed distributions. For more about this, see Empirical Relation between Mean,
Median and Mode [see “Sources Used” at end of book].
Caution! All the statements in this section are a rule of thumb, true for most distributions. The logic holds for almost every unimodal continuous
distribution, and for discrete distributions with a lot of different values. But it tends to break down on discrete distributions that have only a few
different values. For more about this, see von Hippel 2005 [see “Sources Used” at end of book].
Summary: The 1-VarStats command gives you mean, median, and much more for any data set. If you have just a plain list of numbers, enter
the name of that list on the command line. If you have a frequency distribution, enter the name of the data list and the name of the
frequency list on the command line.
Excel: Excel can do these computations. This isn’t an Excel course, but if you’re an Excel head you can figure out how to get this information.
One way is with the Data Analysis tool, part of the Analysis Toolpak add-in that comes with Excel (though you may have to enable it).
Another way is to click in a blank cell, click Formulas » More Functions » Statistical and select the appropriate worksheet function.
3B1. … from a List of Numbers
Example 4: Professor Marvel had a statistics class of fifteen students, and on one quiz their scores were
10.5 13.5 8 12 11.3 9 9.5 5 15 2.5 10.5 7 11.5 10 10.5
Your TI-83 or TI-84 can give you the mean, median, and other numbers that summarize this data set.
1. If you have any partial commands visible, press [CLEAR].
2. Press [STAT] [ENTER] to get into the edit screen for statistics lists. You can use any list, but let’s use L1 this time. (If you don’t see L1, and
pressing the left arrow doesn’t bring it into view, press [STAT] [5] [ENTER] [STAT] [ENTER].)
3. Cursor to the L1 label at the top — not the top number, the column heading — and press [CLEAR] [ENTER] to clear the list.
4. Enter your numbers, pressing [ENTER] after each one.
5. After entering the last number, check all the numbers carefully and make any needed corrections. If you
duplicated a number, press [DEL] to remove it; if you left out a number, press [2nd DEL makes INS] to open a space
for it.
6. Press [STAT] [►] [1] to select 1-VarStats.

If you have a newer TI-84 with the “wizard” interface selected, a li le menu will appear. Identify the list that
contains your data: [2nd 1 makes L1]. For a simple list of numbers like this one, there is no frequency list, so
press [DEL].
If you have an older calculator or you’ve turned off the “wizard’ interface, the calculator will paste
1-VarStats to the home screen. On the same line, identify the list that contains your data: [2nd 1 makes L1].
7. After writing down the complete command on your paper — 1-VarStats L1 — press [ENTER] to execute it.
The results screen is shown below. A down arrow on the screen says that there is more information if you press [▼], and an up arrow says that there
is more information if you press [▲].
Look first at the bo om of the screen. Always check n first — if it doesn’t match your sample or population size, the other numbers are big sacks of
bogosity. In this case a quick count of the original data set shows 15 numbers, which is the right quantity. (Of course, this check can’t determine if
you miskeyed any numbers. Only double and triple checking can protect you from that kind of mistake.)
What are you seeing on this screen?
x̅ is the mean. The calculator doesn’t know whether a given data set is a sample or a population, so it can’t guess whether to display x̅ or µ.
This data set is quiz scores from the complete class, so it’s a population and you write down µ = 9.7 . Always use the right symbols, even if
the calculator doesn’t.
A word about rounding: The rules for significant digits and rounding are beyond the scope of this course, but beware of being ridiculously
precise. (For example, most gasoline pumps are calibrated in 0.001 gallon units. But 0.001 gallon is two tablespoons, and there’s considerably
more gas than that in the hose, so that precision is just silly.)
A good rule of thumb is to report sample statistics and population parameters to one more decimal place than the original data. Then
why did I say µ = 9.7 instead of 9.72, since the original data have one decimal place? That’s a valid question, and my answer is that 9.72
would not be wrong but it feels overly precise when there are only fifteen data points, most are whole numbers, and the rest are a whole
number plus ½.
∑x and ∑x² are the sum of the original data and the sum of the squares of the original data. You won’t use them in this course, and there’s no
reason to write them down.
Sx and σx are the standard deviation, computed by two methods. Choose Sx with a sample and σx with a population. More when we get to
Measures of Spread!
Since this data set is a population, select σx = 3.057929583 and write down σ = 3.1 .
BTW: The name standard deviation was created in 1893 by Karl Pearson [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_PearsonK]. (We might wish
that he had chosen something with fewer than six syllables.) He assigned the symbol σ to the standard deviation of a population in 1894.
n is the sample size or population size. Since you have a population, you write down N = 15 , not n=15.
minX and maxX are the smallest and largest data points.
Q1 and Q3 are the first and third quartiles, which we’ll get to under Measures of Position.
Med is the median, which you met earlier. Med = 10.5 .
The minimum, Q1, median, Q3, and maximum are together called the five-number summary. I’ll have more to say about the five-number
summary later in this chapter.
Showing your work and your results, you write down:

1-VarStats L1
µ = 9.7
σ = 3.1
N = 15
min = 2.5
Q1 = 8
Med = 10.5
Q3 = 11.5
max = 15
3B2. … from an Ungrouped Distribution
Example 5: Your TI-83 or TI-84 can also compute statistics of a frequency distribution. Let’s try it with the data from Number of Adults
Chapter 2 for number of adults in vehicles entering the park. in Vehicles Entering Park
Adults in Number of
Enter the data values in one statistics list, such as L1. Enter the frequencies in a second list, such as L2. Press [STAT]
Vehicle Vehicles
[►] [1] to select 1-VarStats.
0 2
In the “wizard” interface, enter [2nd 1 makes L1] for List and [2nd 2 makes L2] for FreqList.
In the non-“wizard” interface, press [2nd 1 makes L1], then [,] (comma), then [2nd 2 makes L2] and [ENTER]. The 1 5
data list must come first and the frequency list second. 2 7
Caution — rookie mistake: Students often leave off the frequency list. Your calculator is pre y good, 3 15
but it can’t read your mind. The only way it knows that you have a frequency distribution is if you give it 4 5
both the frequency list and the data list. 5 2
Either way, write down the complete command on your paper: 1-VarStats L1,L2. 6 2
7 1
8 1
Total 40
Here are the results:
Again, look at n first. That protects you from the rookie mistake of leaving off the frequency list. If n is wrong, redo your 1-VarStats command and
this time do it right.
These forty vehicles are obviously not all the vehicles that enter the park, so they are a sample, not a population. You therefore write down the
statistics as follows:
1-VarStats L1,L2
x̅ = 3.0
s = 1.7 (from Sx = 1.73186575)
n = 40
min = 0
Q1 = 2
Med = 3
Q3 = 4
max = 8
Weighted Average
Sometimes you take an average where some data points are more important than others. We say that they are weighted more heavily, and the mean
that you compute in this way is called a weighted average or weighted mean.
You’re intimately familiar with one example of a weighted average: your GPA or grade point average.
Example 6: The NHTSA’s Corporate Average Fuel Economy or CAFE Rule (NHTSA 2008) [see “Sources Used” at end of book] specifies a corporate
average of 34.8 mpg (miles per gallon) for passenger cars. Let’s keep things simple and suppose that ZaZa Motors makes three models of passenger
car: the Behemoth gets 22 mpg, the Ferret gets 35 mpg, and the Mosquito gets 50 mpg. Does ZaZa meet the standard?
To answer that, you can’t just average the three models: (22+36+50)/3 = 36 mpg. Suppose the company sells one Mosquito and the rest are
Behemoths and a sprinkling of Ferrets? You have to take into account the number of cars of each model sold. In effect, you have a frequency
distribution with mpg figures and repetition counts. Let’s suppose these are the sales figures:
Auto Sales by ZaZa Motors
Model Miles per Gallon Number Sold
Behemoth 22 100,000
Ferret 35 250,000
Mosquito 50 20,000
Total 370,000
Put the miles per gallon in L1 and the frequencies in L2. (How do you know it’s not the other way around? You’re trying to find an average mpg, so
the mpg numbers are your data.) You should find:
1-VarStats L1,L2
µ = 32.3 mpg
N = 370,000 passenger cars
Even though two of the three models meet the standard, the mix of sales is such that ZaZa Motors’ CAFE is 32.3 mpg, and it’s not in compliance.
BTW: The formula for the mean of a grouped distribution and the formula for a weighted average are the same formula: µ = ∑xf/N for a population or x̅ = ∑xf/n for a sample.
Either way, take each data value times its frequency. Add up all those products, and divide by the population size or sample size. For the notation, see ∑ Means Add ’em Up in
Chapter 1.
3B3. … from a Grouped Distribution
In a grouped frequency distribution, one number called the class midpoint stands for all the numbers in the class.
Definition: The class midpoint for a given class equals the lower boundary plus half the class width. This is half way between the lower class
boundary of this class and the lower class boundary of the next class.
Example 7: Let’s revisit the lengths of iTunes songs from the ungrouped histogram in Lengths of iTunes Songs (seconds)
Chapter 2. What is the midpoint of the 300 to 399 class? Class Boundaries Class Midpoint Frequency
The class width equals the difference between lower boundaries: 400−300 = 100. Half the 100–199 150 9
class width is 50, so the midpoint is 300+50 = 350. You could also compute the class midpoint
200–299 250 20
as (300+400)/2 = 350.
300–399 350 9
However, it is wrong to take (300+399)/2 = 349.5 as class midpoint or 399−300 = 99 as class
width. Don’t use the upper boundary in finding the class midpoint. 400–499 450 7
500–599 550 3
Of course you don’t have to compute every class midpoint the long way. Once you have the 600–699 650 1
midpoint of the first class, (100+200)/2 = 150, just add the class width repeatedly to get the rest: 700–799 750 0
250, 350, … 850. The grouped frequency distribution, with the class midpoints, is shown at
800–899 850 1
right.
What good is the class midpoint? It’s a stand-in for all the numbers in its class. Instead of being
concerned with the nine different numbers in the 100 to 199 class, twenty different numbers in the 200 Because this textbook helps you,
to 299 class, and so on, we pretend that the entire data set is nine 150s, twenty 250s, and so on. This please donate at
means you get approximate statistics, but you get them with a lot less work. BrownMath.com/donate.
Is this legitimate? How good is the approximation? Usually, quite good. In most data sets, a given
class holds about equally many data points below the class midpoint and above the class midpoint, so
the errors from making the approximation tend to balance each other out. And the bigger the data set, the more points you have in each class, so the
approximation is usually be er for a larger data set.
Procedure: Enter the class midpoints in one statistics list, such as L1. Enter the frequencies in another list, such as L2. Enter the command
1-VarStats L1,L2 and write down the complete command on your paper.
Again, avoid the rookie mistake: include the class-midpoint list and the frequency list in your command.
The results screens are below. As usual, before you look at anything else, check that n matches the size of the data set. 50 is
correct, so that’s one less worry.
There’s a problem with the second screen, though. Your calculator knows you have a frequency distribution, because you gave two lists to the
1-VarStats command. But it doesn’t have the original data, so it doesn’t know the true minimum (lowest data point). When you read minX=150, you
interpret that to mean that the lowest data point occurs in the class whose midpoint is 150; in other words, the minimum is somewhere between 100
and 199. Your knowledge of the rest of the five-number summary has the same limitation. For instance, the median isn’t 250; all you know is that it
occurs somewhere between 200 and 299.
Because of these limitations, you don’t do anything with the second results screen from a grouped distribution. The mean and standard
deviation don’t have this problem: they’re approximate, but the approximation is good enough. (n is exact, not an approximation.)
These 50 iTunes songs are obviously not all the songs there are, not even all the songs in any particular person’s iTunes library. They are a sample,
not a population. Therefore you write down your work and results like this:
1-VarStats L1,L2
x̅ = 316 (or you could write 316.0)
s = 145.1
n = 50
There are four common measures of the spread of a data set: range, interquartile range or IQR, variance, and standard deviation. (You may also see
spread referred to as dispersion, sca er, variation, and similar words.)
3C1. Range and IQR (Interquartile Range)
Definition: The range of a data set is the distance between the largest and smallest members.
Example 8: If the largest number in a data set is 100 and the smallest is 20, the range is 100−20 = 80, regardless of what numbers lie between them and
what shape the distribution might have.
Caution: The range is one number: 80, not “20 to 100”.
Obviously the range has a problem as a measure of spread: It uses only two of the numbers. Since only the two most extreme numbers in the data set
get used to compute the range, the range is about as far from resistant as anything can be.
In favor of the range is that it’s easy to compute, and it can be a good rough descriptor for data sets that aren’t too weird. The interquartile range
has something of the same idea, but it is resistant.
De inition: The interquartile range (IQR) is the distance between the largest and smallest members of the middle 50% of the data points, taking
repetitions into account.
Alternative definition: The IQR is the third quartile minus the first quartile, or the 75th percentile minus the 25th percentile.
You’ll learn about percentiles and quartiles in the next section, Measures of Position, but for now let’s just take a quick non-technical example.
Example 9: Consider the data set 1, 2, 3, 3, 3, 4, 5, 8, 11, 11, 15, 23. There are twelve numbers, and the middle 50% (six numbers) are 3, 3, 4, 5, 8, 11. The
interquartile range is 11−3 = 8.
3C2. Standard Deviation
The IQR is a be er measure of spread than the range, because it’s resistant to the extreme values. but it still has the problem that it uses only two
numbers in the data set. Isn’t there some measure of spread that uses all the numbers in the data set, as the mean does? The answer is yes: the
variance and the standard deviation use all the numbers.
Your calculator gives you the standard deviation, as you saw above. The variance is important in a theoretical stats course, but not so much in
this practical course. We’ll measure spread with the standard deviation almost exclusively. (To save wear and tear on my keyboard and your
printer, I’ll often use the abbreviation SD.)
If you’d like to know how the variance and SD are computed, read the “BTW” section that follows. Otherwise, skip down to “What Good Is the
Standard Deviation, Anyway?”
BTW: To see how the variance is computed, let’s go back to Professor Marvel’s quiz scores. We computed the mean as 9.7, or to use the
x x−µ (x−µ)²
unrounded value, µ = 9.72. (Never round numbers if you’re going to use them in further calculation; that’s the Big No-no.) 10.5 0.78 0.6084
If you want to devise a measure of spread, it seems reasonable to consider spread from the mean, so try subtracting the mean 13.5 3.78 14.2884
from each quiz score and then adding up all those deviations. You get zero, so obviously “sum of deviations” isn’t a useful 8 -1.72 2.9584
measure of spread. 12 2.28 5.1984
But with the next column you strike gold. Squaring all the deviations changes the negatives to positives, and also weights 11.3 1.58 2.4964
the larger deviations more heavily. This is progress! Now divide the total of squared deviations by the population size and you 9 -0.72 0.5184
have the variance: σ² = 140.2640/15 = 9.3509. (σ is the Greek le er sigma.)
9.5 -0.22 0.0484
(When computing the variance of a sample, you divide by n−1 rather than n. The reasons are technical and are explained in
5 -4.72 22.2784
Steve Simon’s articles Degrees of Freedom (1999a) [see “Sources Used” at end of book] and Degrees of Freedom, Part 2 (2004)
[see “Sources Used” at end of book]. 15 5.28 27.8784
The variance is quite a good measure of spread because it uses all the numbers and combines their differences from the mean 2.5 -7.22 52.1284
in one overall measure. But it’s got one problem. If the data are dollars, the squared deviations will be in square dollars, and 10.5 0.78 0.6084
therefore the variance will be in square dollars. What’s a square dollar? (No, I don’t know either.) You want a measure of spread 7 -2.72 7.3984
that is in the same units as the original data, just like the mean and median are. The simplest solution is to take the square root of 11.5 1.78 3.1684
the variance, and when you do that you have the standard deviation (SD), σ = √(140.2640/15) = 3.05793, which rounds to 3.1. 10 0.28 0.0784
And because the standard deviation is in the same units as the original data, it can be used as a yardstick, as you’ll see below.
10.5 0.78 0.6084
For lovers of formulas, here they are. The standard deviation of a population, σ, has population size N on the bo om of the
Total 0.00 140.2640
fraction; the standard deviation of a sample, s, has sample size n minus 1 on the bo om of the fraction. If you’re not familiar with
the ∑ notation (sigma or summation), ∑x² means square every data value and add the squares; ∑x²f means square every data value, multiply by the
frequency, and add those products. For the notation, see ∑ Means Add ’em Up in Chapter 1.
Formulas for Standard Deviation

of a List of Numbers of a Frequency Distribution
Formulas for Standard Deviation

of a List of Numbers of a Frequency Distribution
When the data set is the whole population
When the data set is just a sample
Why are there two formulas on each row under “list of numbers”? The first formula is the definition, and the second is a shortcut for faster computations. Of
course they’re mathematically equivalent; you could prove that if you wanted to.
BTW: Sir Ronald Fisher [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Fisher] coined the term variance in 1918. He used the symbol σ² for the variance of a
population, since Pearson had already assigned σ to the standard deviation, and the variance is the square of the SD.
What Good Is the Standard Deviation, Anyway?
The standard deviation will be the key to inferential statistics, starting in Chapter 8, but even within the realm of descriptive statistics there are some
applications. In addition to this section, you’ll see an application in z-Scores, below.
Working with the quiz scores on your TI-83 or TI-84, you found that the population mean was µ = 9.7 and the population SD was σ = 3.1. What does
this mean?
Just as a concept, the standard deviation gives you an idea of the expected variation from one member of the sample or population to the next. The
SD in this example is about a third of the mean, so you expect some variation but not a lot. But can you do be er than this? Yes, you can!
The Empirical Rule for Normal Distributions
You can predict what percentage of the data will be within a certain number of standard deviations above or below the mean. In a normal
distribution, 68% of the data are between one SD below the mean and one SD above the mean (µ±σ), 95% are within two SD of the mean (µ±2σ), and
99.7% are within three SD of the mean (µ±3σ).
This is the Empirical Rule or 68–95–99.7 Rule. Caution! It’s good for normal distributions only.
BTW: You’ll notice that the 68%, 95%, and 99.7% of data occur within approximately one, two, and three SD of the mean. More accurate figures are shown in the pictures,
but for now we’ll just use the simple rule of thumb. You’ll learn how to make precise computations in Chapter 7 [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c07_Normal].
BTW: It’s not a traditional part of the Empirical Rule, but another useful rule of thumb is that, in a normal distribution, about 50% of the data are within 2/3 of a SD above
and below the mean.
Example 10: Adult women’s heights are normally distributed with µ = 65.5″ and σ = 2.5″. (By the way, different sources give different values for
human heights, so don’t be surprised to see different figures elsewhere in this book.) How tall are the middle 95% of women?
Solution: The middle 95% of the normal distribution lies between two SD below and two SD above the mean. 2σ = 2×2.5 = 5″, and 65.5±5 = 60.5″
to 70.5″, so 95% of women are 60.5″ to 70.5″ tall.
Actually there are two interpretations. You can say that 95% of women are 60.5″ to 70.5″ tall, or you can say that if you randomly select one woman
the probability that she’s 60.5–70.5″ tall is 95%. Any probability statement can be turned into a proportion statement, and vice versa. You’ll learn
about this in Interpreting Probability Statements [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c05_BasicsInterp] in Chapter 5.
Example 11: What fraction of women are 65.5″ to 68″ tall?

Solution: 68−65.5 = 2.5, so 68″ is one standard deviation above the mean. You know that 68% of a normal distribution is within µ±σ. You also
know that the normal distribution is symmetric, so 68%/2 = 34% of women are within one SD below the mean, and 34% are within one SD above the
mean. Therefore 34% of women are 65.5″ to 68″ tall.
You can combine the three diagrams above and show data in regions bounded by each whole number of standard deviations, like this:
TIP: If this diagram doesn’t come out well in black-and-white printing, you can view or print it in color at
<h ps://BrownMath.com/swt/pic/chap03_empirical.jpg>.
BTW: Where do these figures come from? For example, how do we know that about 13.5% of the population is between one and two standard deviations below the mean in a
normal distribution? Well, 95% is between two SD below and two SD above the mean. Half of 95% is 47.5%, so 47.5% of the population is between the mean and two SD
below the mean. Similarly, about 68% is between one SD below and one SD below, so 68/2 = 34% is between the mean and one SD below. But if 47.5% is between µ−2σ and
µ — call it Region A — and 34% is between µ−σ and µ, then the part of Region A that is not in the 34% is the part between µ−2σ and µ−σ, and that must be 47.5−34 =
13.5%. If you had an afternoon to kill, you could work out the other seven percentages.
With this diagram, you can work Example 11 more easily, directly reading off the 34% figure for women between mean height and one SD above the
mean. You can also work more complicated examples, like this one.
Example 12: If you randomly select a woman, how likely is it that she’s taller than 70.5″?
Solution: 70.5−65.5 = 5.0, so 70.5″ is two SD above the mean. From the diagram, you see that 2.35+0.15 = 2.5% of the population is more than two
SD above the mean. Answer: a randomly selected woman has a 2.5% of being more than 70.5″ tall.
Optional: Chebyshev’s Inequality
If you have a normal distribution, the Empirical Rule tells you how much of the population is in each region. What if you don’t have a normal
distribution?
As you might expect, the portions of the population in the various regions depends on the shape of the distribution, but Chebyshev’s Inequality
(or Chebyshev’s Rule) gives you a “worst case scenario” — no ma er how skewed the distribution, at least 75% of the data are within 2 SD of the
mean, and at least 89% are within 3 SD of the mean.
More generally, within k SD above and below the mean, you will find at least (1−1/k²)·100% of the data. (If you plug in k = 1, you’ll find that at
least 0% of the data lie within one SD of the mean. Distributions where all the data are more than one SD away from the mean are unusual, but they
do exist.)
Example 13: For the quiz scores, two standard deviations is 2×3.0579 = 6.1, so you expect at least 1−1/2² = 1−¼ = 75% of the quiz scores to be within the
range 9.7±6.1 = 3.6 to 15.8. Remember that this is a worst case. In fact, 14 of the 15 numbers (93%) are within those limits.
Summary: The measures of center and spread that you’ve studied are properties of the data set as a whole. Now we look at measures of position, which
consider how a given data point stands in relation to the whole sample or population that it’s part of.
3D1. Percentiles
De inition: The percentile rank of a data point is the percentage of the data set that is equal to or less than the data point. We say that the data point is at
the __th percentile or %ile for short.
The symbol is P followed by a number. For example, P35 or P35 denotes the 35th percentile, the member of the data set that is
greater than or equal to 35% of the data.
Percentiles are most often used in measures of human development, like your child’s performance on standardized tests, or an infant’s length or
weight.
Example 14: Your daughter takes a standardized reading test, and the report says that she is in the 85th percentile for her grade. Does this make you
happy or sad? Solution: 85% of her grade read as well as she does, or less well; only 15% read be er than she does. Presumably this makes you
happy.
Example 15: Consider the data set 1, 4, 7, 8, 10, 13, 13, 22, 25, 28. (To find percentiles, you have to put the data set in order.)
(a) What is the percentile rank of the number 13? Solution: There are ten numbers in the data set, and seven of those are ≤13. Seven out of ten is
70%, so the percentile rank of 13 is 70, or “13 is at the 70th percentile”, or P70 = 13.
(b) Find P60 for this data set. Solution: What number is greater than or equal to 60% of the numbers in the data set? Counting up six numbers
from the beginning, you find … 13 again. So 13 is both P60 and P70.
(Anomalies like this are usual when you have small data sets. It really doesn’t make sense to talk about percentiles unless you have a fairly large
data set, typically a population like all third graders or all six-week-old infants.)
BTW: Everybody agrees on the idea of a percentile, but different authors have different ways to compute it. For example, some authors say a percentile rank is the percent of
data less than the data point, instead of less than or equal to as I did. By their definition there is a 0th percentile but no 100th percentile; by my definition there is no 0th
percentile but there is a 100th percentile. And some define percentiles in such a way that the percentile (like the mean) need not be a member of the data set.
The different definitions can give very different answers for small data sets. Nobody worries too much about this, because in practice you seldom compute
percentiles against small data sets. (What does “18th percentile” mean in a set of only 12 numbers?) All the definitions give pre y much the same answer for
larger data sets.
David Lane’s Percentiles (2010) [see “Sources Used” at end of book] gives three definitions of percentile and shows what difference they make. His
Definition 2 is the one I use in this book.
3D2. Quartiles
De initions: The irst quartile (Q1) is the member of the data set that is greater than or equal to a quarter of the data points. The third quartile (Q3) is the
member of the data set that is greater than or equal to three quarters of the data points.
To find quartiles by hand, put the data set in order and find the median. If you have an odd number of data points, strike out the
median. Q1 is the median of the lower half, and Q3 is the median of the upper half.
One fourth is 25% and three fourths is 75%, so Q1 = P25 and Q3 is P75. (I chose a definition of percentiles that makes this happen. Some authors use
different definitions, which may give slightly different results.)
What, no Q2? There is a Q2, but two quarters is one half, or 50%, so the second quartile is be er known as the median: 50% of the data are less
than or equal to the 50th %ile, alias Q2, alias the median.
The quartiles and the median divide the data set into four equal parts. We sometimes use the word quartile in a way that reflects this: the
“bo om quartile” means the part of the data set that is below Q1, and the “ùpper quartile” or “top quartile” means the part of the data set that is
above Q3.
Q1 and Q3 are part of the five-number summary (later in this chapter). From Measures of Spread, you already know that they’re used to find the
interquartile range, and later in this chapter you’ll use the IQR to make a box-whisker plot.
BTW: Just like percentiles, quartiles are defined slightly differently by different authors. Dr. Math gives a nice, clear rundown of different ways of computing quartiles in
Defining Quartiles in The Math Forum (2002) [see “Sources Used” at end of book]. I follow Moore and McCabe’s method, which is also used by your TI-83 or TI-84.
3D3. z-Scores
You’ll use z-scores more than any other measure of position. (Remember that every measure of position measures the position of one data point
within the sample or population that it is part of.)
De inition: The z-score of a data point is how many standard deviations it lies above or below the mean. (A z-score is sometimes called a standard score.)
How do you find out how many SD a number is above or below the mean of its data set? You subtract the mean, and then divide the result by the SD.
z-score within a sample: z-score within a population:
Either way, it’s
When you compute a z-score, the top and bo om of the fraction are both in the same units as the original data, and therefore the z-score itself has no
units. z-scores are pure numbers.
What good are z-scores? You’ll use them in inferential statistics, starting in Chapter 9, but you can also use them in descriptive statistics.
For one thing, a z-score gives you economy in language. Instead of saying “at least 75% of the data in any distribution must lie between two
standard deviations below the mean and two standard deviations above the mean”, you can say “at least 75% of the data lie between z = ±2.”
A z-score helps you determine whether a measurement is unusual. For instance, how good is an SAT verbal score of 300? Scores on the SAT
verbal are ND with mean of 500 and SD of 100, so z = −2. The Empirical Rule tells you only 2½% of students score that low or lower.
And z-scores are also good for comparing apples and oranges, as the next example shows.
Example 16: You have two candidates for an entry-level position in your restaurant kitchen. Both have been to chef school, but different schools, and
neither one has any experience. Chris presents you with a final exam score of 86, and Fran with a final exam score of 67. Which one do you hire?
At first glance, you’d go with the one who has the higher score. But wait! Maybe Fran with the 67 is actually be er, and just went to a tougher
school. So you ask about the average scores at the two schools. Chris’s school had a mean score of 76, and Fran’s school had a mean score of 59.
Assuming that the students at the two schools had equal innate ability, Fran went to a tougher school than Chris.
Chris scored 10 points above the school average, while Fran scored only 8 points above the school average. Now do you hire Chris? Not yet!
Maybe there was more variability in Chris’s class, so 10 points above the average is no big deal, but there was less variability in Fran’s, so 8 points
above the mean is a big deal. So you dig further and find that the standard deviations of the two classes were 8 and 4. At this point, you make a table:
Chris Fran
Candidate’s score 86 67
School mean 76 59
School SD 8 4
z-score (86−76)/8 = 1.25 (67−59)/4 = 2.00
The z-scores tell you that Fran stands higher in Fran’s class than Chris stands in Chris’s class. Assuming that the two classes as a whole were of equal
ability, Fran is the stronger candidate.
Definition: The five-number summary of a data set is the minimum value, Q1, median, Q3, and maximum value (in order).
The five-number summary combines measures of center (the median) and spread (the interquartile range and the range). A plot of the five-number
summary, called a box-whisker diagram (below), shows you shape of the data set.
On the TI-83 or TI-84, the five-number summary is the second output screen from 1-VarStats. Caution! Remember that the second screen is
meaningful only for a simple list of numbers or an ungrouped distribution, not for a grouped distribution. To produce a five-number summary, you
need all the original data points.
Example 17: Here is the second output screen from the quiz scores earlier in this chapter. The five-number summary is
2.5, 8, 10.5, 11.5, 15 .
The median is 10.5, meaning that half the students scored 10.5 or below and half scored 10.5 or above.
The interquartile range is Q3−Q1 = 11.5−8 = 3.5. Half of the students scored between 8 and 11.5.
3E1. Outliers
De inition: An outlier is a data value that is well separated from most of the data.
Conventionally, the values Q1−1.5×IQR and Q3+1.5×IQR (first quartile minus 1½ times interquartile range, and third quartile plus
1½ times interquartile range) are called fences, and any data points outside the fences are considered outliers.
Example 18: Here again are the quiz scores from earlier in this chapter:
10.5 13.5 8 12 11.3 9 9.5 5 15 2.5 10.5 7 11.5 10 10.5
Find the outliers, if any.
The five-number summary, above, gave you the quartiles: Q1 = 8 and Q3 = 11.5. The interquartile range is 11.5−8 = 3.5, and 1.5 times that is 5.25.
The fences are 8−5.25 = 2.75 and 11.5+5.25 = 16.75. All the data points but one lie within the fences; only 2.5 is outside. Therefore 2.5 is the only outlier
in this data set.
You can find outliers more easily by using your TI-83 or TI-84; see below.
Why do you care about outliers? First off, an outlier might be a mistake. You should always check all your data carefully, but check your outliers
extra carefully.
But if it’s not a mistake, an outlier may be the most interesting part of your data set. Always ask yourself what an outlier may be trying to tell
you. For example, does this quiz score represent a student who is trying but needs some extra help, or one who simply didn’t prepare for the quiz?
What do you do with outliers? One thing you definitely don’t do: Don’t just throw outliers away. That can really give a false picture of the situation.
But suppose you have to make some policy decision based on your analysis, or run a hypothesis test (Chapters 10 and 11) and announce whether
some claim is true or false?
One way is to do your analysis twice, once with the outliers and once without, and present your results in a two-column table. Anyone who
looks at it can judge how much difference the outliers make. If you’re lucky, the two columns are not very different, and whatever decision must be
made can be made with confidence.
But maybe the two columns are so different that including or excluding the outliers leads to different decisions or actions. In that case, you may
need to start over with a larger sample, change your data collection protocol, or call in a professional statistician.
For more on handling outliers, see Outliers (Simon 2000d) [see “Sources Used” at end of book].
3E2. Box-Whisker Diagrams
The five-number summary packs a lot of information, but it’s usually easier to grasp a summary through a picture if possible. A graph of the five-
number summary is called a boxplot or box-whisker diagram.
BTW: The box-whisker diagram was invented by John Tukey [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Tukey] in 1970.
A box-whisker diagram has a horizontal axis, which is the number line of the data, and the number line need not start at zero. Either the axis or the
chart as a whole needs a title, but there’s usually no need for a title on both. There is no vertical axis.
For the graph itself, first identify any outliers and mark them as squares or crosses. Then draw a box with vertical lines at Q1, the median, and
Q3. Lastly, draw whiskers from Q3 to the greatest value in the data set that isn’t an outlier, and from Q1 to the smallest value in the data set that isn’t
an outlier.
Example 19: Let’s look at a box-whisker plot of those same quiz scores, which were
10.5 13.5 8 12 11.3 9 9.5 5 15 2.5 10.5 7 11.5 10 10.5
The five-number summary is reproduced at right. You recall from the previous section that there is one outlier, 2.5, so the smallest number in the data
set that isn’t an outlier is 5.
Here’s a plot that I made with StatTools from Palisade Corporation [URL h p://www.palisade.com/sta ools/ accessed 2014-09-13]:
Box-Whisker Plot, and Shape of a Data Set
The box-whisker plot is almost as good as a histogram for showing you the shape of a distribution. If one whisker is longer than the other, and
especially if there are outliers on the same side as the long whisker, the distribution is skewed in that direction. If the whiskers are about the same
length and there are no outliers, but one side of the box is longer than the other, that usually indicates skew in that direction as well.
Example 20: In the boxplot of quiz scores, just above, you see an outlier on the left side, and the left side of the box is longer than the right. That
indicates that the distribution is left skewed.
Box-Whisker Plot on TI-83/84/89
You can use your TI-83 or TI-84 to make a box-whisker plot. The calculator comes with that ability — see Box-Whisker Plots on TI-83/84 [URL:
h ps://BrownMath.com/ti83/boxplot.htm] — but it’s easier to use MATH200A Program part 2. See Ge ing the Program [URL:
h ps://BrownMath.com/ti83/math200a.htm#Download] for instructions on ge ing the program into your calculator.
(If you have a TI-89, see Box-Whisker Plots on TI-89 [URL: h ps://BrownMath.com/ti83/bp89.htm].)
To make a box-whisker plot with the program, begin by entering the numbers into a statistics list, such as L1. (If you have an ungrouped frequency
distribution, put the numbers in one list and the frequencies in a second list. You need the original data for a boxplot, so you can’t make a boxplot of a
grouped frequency distribution.)
Now press [PRGM]. If you can see MATH200A in the list, press its menu number; otherwise, use the [▼] or [▲] key to get to MATH200A, and press [ENTER].
With the program name on your home screen, press [ENTER] (again) to run the program, and yet again to dismiss the title screen. You’ll then see a
menu. Press [2] for box-whisker plot.
The program asks whether you have one, two, or three samples. Select 1, since that’s what you have.
The program wants to know whether you have a plain list of numbers or a grouped frequency distribution. Since you have
a plain list, choose 1.
The program needs to know which list holds the numbers to be plo ed.
Finally, the program presents the box-whisker plot.
Finding Outliers with the TI-83/84/89 or Excel
When you have a box-whisker plot on your screen, whether you used MATH200A part 2 or the calculator’s native
commands, if you see any outliers press [TRACE] and then [◄] or [►] to find which data points are outliers.
(For the TI-89, see Box-Whisker Plots on TI-89 [URL: h ps://BrownMath.com/ti83/bp89.htm]. If you prefer to use Excel
to find outliers, see Normality Check and Finding Outliers in Excel [URL: h ps://BrownMath.com/stat/nchkxl.htm].)
Five-Number Summary from TI-83/84/89 Boxplot
After pressing the [TRACE] key, you can get the five-number summary by pressing [◄] or [►] repeatedly. If there are outliers at the left, use the lowest
one for the minimum (first number in the five-number summary); if there are outliers at the right, use the highest one for the maximum (last number
in the five-number summary).
Overview: With numeric data, the goal of descriptive stats is to show shape, center, spread, and outliers.
Measures of center: mean, median, mode; “resistant”. When is mean be er, and when is median more representative?
Measures of spread: range, variance, standard deviation (SD). Know the advantages and disadvantages of each.
Interpreting standard deviation in a normal distribution with the Empirical Rule (68–95–99.7 Rule).
Most important measure of position: z-score. Use formulas to find z-score from raw score or vice versa. Be able to interpret z-scores.
Other measures of position: percentiles and quartiles.
Interquartile range (IQR).
Finding mean, SD, and five-number summary with the TI-83 or TI-84. You can do this for a simple list of numbers, for an
ungrouped distribution, or for a grouped distribution. The five-number summary of a grouped distribution is not meaningful.
Meaning of outliers and what to do when they occur.
Boxplot shows outliers if any, plus the five-number summary. (A boxplot isn’t meaningful for a grouped distribution.)
Weighted mean.
Seeing shape of a distribution in its boxplot, or from the relationship between mean and median.

do it after all.
When is the mean not the best choice for a measure of center? What would you use instead?
1
Your doctor tells you that you’re in the 15th percentile for cholesterol. Should you be concerned, or should you go out and celebrate with bacon-
2 wrapped shrimp? (Give your reason, not just an answer.)
Consider these questions about measures of spread:

3 (a) What’s the biggest problem with the range?
(b) What makes the interquartile range a be er measure of spread?
(c) Why is the variance be er than both?
(d) What makes the standard deviation (SD) be er than the variance? (Give two reasons.)
Contrast (a) s and σ, (b) µ and x̅, (c) N and n.

4
Your smart-alecky statistics prof distributes quiz results as z-scores rather than raw scores. It’s a large class, and quiz scores were normally
5 distributed. Your z-score was +1.87. How did you do relative to the class?
Weights of apples (of a particular type) are normally distributed. In a large shipment, you find that nearly all the apples weigh between 4.50 and
6 8.50 ounces. Estimate the SD of the weight of apples in that shipment.
The grouped frequency distribution at right is the ages reported by a sample of Roman Catholic nuns, from Johnson and
7 Kuby (2004, 67) [see “Sources Used” at end of book].
Ages Frequency
(a) Approximate the mean and SD of the ages of these nuns, to two decimal places, and find the sample size. 20 – 29 34
(b) Explain why a boxplot of this distribution is a bad idea.
30 – 39 58
40 – 49 76
50 – 59 187
60 – 69 254
70 – 79 241
80 – 89 147
You took the courses shown at right. On the usual scale of A = 4.0, A− = 3.7, B+ = 3.3, and so forth,
8 compute your GPA. (Your GPA, grade point average, is the average of your course grades, weighted by
Course Credits Grade
number of credits in each course.) Statistics 3 A
Calculus 4 B+
Microsoft Word 1 C−
Microbiology 3 B−
English Comp 3 C
Your prof has a policy that you can skip the final exam if your quiz average is 87% or be er. After ten quizzes, your average is 86%. One quiz
9 remains. Is it still possible for you to skip the final, and if so what percentage score do you need on that last quiz?
In a GM factory in Brazil, 25 workers were asked their commuting distance in kilometers. The data, from Dabes Commuting Distances in km
10 and Janik (1999, 8) [see “Sources Used” at end of book], are shown at right. 5 15 23 12 9
(a) Construct a grouped frequency distribution for 0–9 km, 10–19 km, and so on. (You made a stemplot for a 12 22 26 31 21
homework problem in Chapter 2, so use that answer [URL: 11 19 16 45 12
h ps://BrownMath.com/swt/pfswt.htm.htm#h02_commute] to save yourself some work.) 8 26 18 17 1
(b) What is the class width? What are the class midpoints? 16 24 15 20 17
(c) Use your grouped distribution to approximate the mean and SD of the commuting distances.
(d) Now compute the mean, median, and SD of the original data set.
(e) Construct a box-whisker plot from the original data set. (Never make a boxplot from grouped data.). Suggestion: do it on your calculator, then
transfer it to paper where you have already drawn and labeled the number line.
(f) Which is the most appropriate measure of center for this sample? Why?
(g) Give the five-number summary and identify any outliers.
SAT verbal scores are normally distributed, with a mean of 500 and SD of 100. You randomly select a test taker. What’s the probability that
11 s/he scored between 500 and 700?
Mensa, the largest high-IQ society, accepts SAT scores as indicating intelligence. Assume that the mean combined SAT score is 1500, with SD
12 300. Jacinto scored a combined 2070.
Maria took a traditional IQ test and scored 129. On that test, the mean is 100 and the SD is 15.
From the test scores, who is more intelligent? Explain.
At right is a sample shown as a grouped frequency distribution. Compute the following

13 quantities and label each with its proper symbol: (a) sample size, (b) mean, (c) standard
Test Scores Frequencies, f
deviation. Round to two decimal places. Use any valid method, but show your work. (Begin by 470.0–479.9 15
filling in the third column including column heading.)
480.0–489.9 22
490.0–499.9 29
500.0–509.9 50
510.0–519.9 38
In a particular data set (continuous data), the mean is around 8700 and the median is around 5000. What if anything can you say about the
14 shape of the distribution?
4. Linked Variables
Updated 11 Jan 2015
Intro: When you get two numbers from each member of the sample (bivariate numeric data), you make a plot to look for a relationship
between them. If a straight line seems like a good fit for the plo ed points, we say that they follow a linear model. In this chapter,
you’ll learn when to use a linear model, and how to find the best one.
Contents: 4A. Mathematical Models

4B. Sca erplot, Correlation, and Regression on TI-83/84
Step 0. Setup
Step 1. Make the Sca erplot
Step 2. Perform the Regression
· Correlation Coefficient, r
· Regression Line, ŷ = ax+b
· Coefficient of Determination, R²
Step 3. Display the Regression Line
Optional: Display the Residuals
· Residual Plot Showing Problems
· Optional Advanced: Residuals and R²
Method 1: Trace on the Regression Line Graph (preferred)
· Extrapolation: Just Say No (Usually)
Method 2: Use Calculated Regression Equation (if necessary)
Finding Residuals
4D. Decision Points for Correlation Coefficient
Procedure
Examples
Interpretation
4E. Optional: Sca erplot, Correlation, and Regression in Excel
Plot the Points
Show the Regression Line
Show the Correlation Coefficient
Predict the Average y
4A. Mathematical Models
The chapter intro talks about points following a “linear model”. But what is a linear model, and what does it mean to follow one? Well, since a linear
model is one kind of mathematical model, let’s talk a li le bit about mathematical models.
You know what a model is in general, right? A copy of the original, usually smaller and with unimportant details left out. Think of model airplanes,
or architect’s models of buildings.
A mathematical model is like that. Real Life is Complicated,™ and mathematical models help us manage those complications.
Definition: A mathematical model is a mathematical description of something in the real world. An object or process or data set follows a model
if the calculations you do with the model match reality closely enough to be useful.
You’ve already met one model in Chapter 3: the grouped frequency distribution. Instead of dealing with all the data points, you do calculations using
the midpoint of each class. That gives you approximate mean and SD, but the approximation is close enough to be useful.
The MathIsFun site has a nice example of modeling the space inside a cardboard box, going beyond the h×w×l formula; see Mathematical Models.
You’ll meet plenty more models in this book: probability models in Chapter 5, several discrete models in Chapter 6, and the normal model in
Chapter 7.
But in this chapter we’re concerned with the linear model.
De inition: The linear model uses the linear equation y = ax+b to model the relationship between two numeric variables x and y. In any particular model, a
and b are constants.
Because the graph of y = ax+b is a straight line, we can also call it a straight-line model, and we say that x and y have a straight-
line relationship in the model.
The linear model is a good one if it describes the data well enough to let you make useful calculations.
4B. Scatterplot, Correlation, and Regression on TI-83/84
Summary: When you have a set of (x,y) data points and want to ind the best equation to describe them, you are performing a regression. You will learn
how to ind the strength of the association between your two variables (correlation coef icient), and how to ind the line of best it (least
squares regression line).
Usually you have some idea that your x variable can help predict your y variable, so you call x the explanatory variable and y the
response variable. (Other names are independent variable and dependent variable.)
See also: A separate version of these instructions for the TI-89 [URL: h ps://BrownMath.com/ti83/regres89.htm]
Sca erplot, Correlation, and Regression in Excel (later in this chapter)
4B1. Step 0. Setup
Set floating point mode, if you haven’t already. [MODE] [▼] [ENTER]
Go to the home screen [2nd MODE makes QUIT] [CLEAR]
Turn on diagnostics with the [DiagnosticOn] command. [2nd 0 makes CATALOG] [x-1]
Don’t press the [ALPHA] key, because the CATALOG command has already put the
calculator in alpha mode.
Scroll down to DiagnosticOn and press [ENTER] twice.
The calculator will remember these se ings when you turn it off: next time you can start with Step 1.
4B2. Step 1. Make the Scatterplot
Before you even run a regression, you should first plot the points and see whether they seem to lie along a straight line. If the distribution is
obviously not a straight line, don’t do a linear regression. (Some other form of regression might still be appropriate, but that is outside the scope of
this course.)
Let’s use this example from Sullivan (2011, 179) [see “Sources Used” at end of book]: the distance a golf ball travels versus the speed with which the
club head hit it.
Club-head speed, mph (x) 100 102 103 101 105 100 99 105
Distance, yards (y) 257 264 274 266 277 263 258 275
Turn off other plots. [Y=]

Cursor to each highlighted = sign or Plot number and press [ENTER] to deactivate.
Set the format screen. Press [2nd ZOOM makes FORMAT]. Just select everything in the
left column.
Enter the numbers in two statistics lists. [STAT] [1] selects the list-edit screen.
Cursor onto the label L1 at top of first column, then [CLEAR] [ENTER] erases the list.
Enter the x values.
Cursor onto the label L2 at top of second column, then [CLEAR] [ENTER] erases the list.
Enter the y values.
Set up the sca erplot. [2nd Y= makes STAT PLOT] [1] [ENTER] turns Plot 1 on.
[▼] [ENTER] selects sca erplot.
[▼] [2nd 1 makes L1] ties list 1 to the x axis.
[▼] [2nd 2 makes L2] ties list 2 to the y axis.
(Leave the square as the selected mark for plo ing.)
Plot the points. [ZOOM] [9] automatically adjusts the window frame to fit the
data.
BTW: I have the grid turned on in some of these pictures, but
earlier I told you to turn it off. That’s simplest. If you want the
grid, you can turn it on, but then you’ll have to adjust the grid
spacing for almost every plot. To adjust grid spacing, press
[WINDOW], set Xscl and Yscl to appropriate values for your
data, and press [GRAPH] to see the result.
Check your data entry by tracing the points. [TRACE] shows you the first (x,y) pair, and then [►] shows
you the others. They’re shown in the order you entered
them, not necessarily from left to right.
A sca erplot on paper needs labels (numbers) and titles on both axes; the x and y axes typically won’t start at 0. Here’s the plot for this data set. (The
horizontal lines aren’t needed when you plot on graph paper.)
When the same (x,y) pair occurs multiple times, plot the extra ones slightly offset. This is called ji er. In the
example at the right, the point (6,6) occurs twice.
If the data points don’t seem to follow a straight line reasonably well, STOP! Your calculator will obey you if you tell it to perform a linear regression,
but if the points don’t actually fit a straight line then it’s a case of “garbage in, garbage out.”
For instance, consider this example from DeVeaux, Velleman, Bock (2009, 179) [see “Sources Used” at end of book]. This is a table of
recommended f/stops for various shu er speeds for a digital camera:
Shu er speed (x) 1/1000 1/500 1/250 1/125 1/60 1/30 1/15 1/8
f/stop (y) 2.8 4 5.6 8 11 16 22 32
If you try plo ing these numbers yourself, enter the shu er speeds as fractions for accuracy: don’t convert them to
decimals yourself. The calculator will show you only a few decimal places, but it maintains much greater
precision internally.
You can see from the plot at right that these data don’t fit a straight line. There is a distinct bend near the left.
When you have anything with a curve or bend, linear regression is wrong. You can try other forms of regression in
your calculator’s menu, or you can transform the data as described in DeVeaux, Velleman, Bock (2009, ch 10) [see
“Sources Used” at end of book] and other textbooks.
4B3. Step 2. Perform the Regression
Set up to calculate statistics. [STAT] [►] [4] pastes LinReg(ax+b) to the home screen.
[2nd 1 makes L1] [,] [2nd 2 makes L2] defines L1 as x values and L2 as y values.
If you have the “wizard’ interface, leave FreqList blank, or press [DEL] if something is
already filled in.
Set up to store regression equation. [,] [VARS] [►] [1] [1] pastes Y1 into the LinReg command.
Show your work! Write down the whole command — Press [ENTER]. The calculator shows correlation and regression statistics and pastes the
LinReg(ax+b) L1,L2,Y1 in this case, not just LinReg or regression equation into Y1.
LinReg(ax+b).
Your input screen should look like this, for the “wizard” and non-wizard interfaces:
Write down the slope a, the y intercept b, the coefficient of determination R², and the correlation coefficient r. (A decent
rule of thumb is four decimal places for slope and intercept, and two for r and R².)
a = 3.1661, b = −55.7966
R² = 0.88, r = 0.94
Now let’s take a look in depth at each of those.
Correlation Coefficient, r
Look first at r, the coefficient of linear correlation. r can range from −1 to +1

and measures the strength of the association between x and y. A positive
correlation or positive association means that y tends to increase as x
increases, and a negative correlation or negative association means that y
tends to decrease as x increases. The closer r is to 1 or −1, the stronger the
association. We usually round r to two decimal places.
BTW: Karl Pearson [URL:

h ps://BrownMath.com/swt/pfswt.htm.htm#bign_PearsonK] developed the formula for
the linear correlation coefficient in 1896. The symbol r is due to Sir Francis Galton in
1888.
For real-world data, 0.94 is a pre y strong correlation. But you might wonder “Several sets of (x,y) [pairs], with the correlation coefficient for each set.
whether there’s actually a general association between club-head speed and Note that correlation reflects the noisiness and direction of a linear
distance traveled, as opposed to just the correlation that you see in this relationship (top row), but not the slope of that relationship (middle), nor
sample. Decision Points for Correlation Coefficient, later in this chapter, many aspects of nonlinear relationships (bo om).”
source: Correlation and Dependence [see “Sources Used” at end of book]
shows you how to answer that question.
BTW: Though nobody ever computes r by hand any more, the formula explains the properties of r. Here are two equivalent forms. In the first form, you compute the z-score of
each x within just the x’s and the z-score of each y within just the y’s. The second formula is easier if you already have the means and SD of the x’s and y’s. For the meaning of
∑, see ∑ Means Add ’em Up [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c01_BigSigma] in Chapter 1.
z-scores are pure numbers without units, and therefore r also has no units. You can interchange the x’s and y’s in the formula without changing the result,
and therefore r is the same regardless of which variable is x and which is y.
Why is r positive when data points trend up to the right and negative when they trend down to the right? The product (x−x̅)(y−y̅) explains this. When
points trend up to the right, most are in the lower left and upper right quadrants of the plot. In the lower left, x and y are both below average, x−x̅ and y−y̅ are
both negative, and the product is positive. In the upper right, x and y are both above average, x−x̅ and y−y̅ are both positive, and again the product is positive.
The product is positive for most points, and therefore r is positive when the trend is up to the right.
On the other hand, if the data trend down to the right, most points are in the upper left (where x is below average and y is above average, x−x̅ is negative,
y−y̅ is positive, and the product is negative) and the lower right (where x−x̅ is positive, y−y̅ is negative, and the product is also negative.) Since the product is
negative for most points, r is negative when data trend down to the right.
Be careful in your interpretation! No ma er how strong your r might be, say that changes in the y variable are associated with changes in the x
variable, not “caused by” it. Correlation is not causation is your mantra.
It’s easy to think of associations where there is no cause. For example, if you make a sca erplot of US cities with x as number of books in the
public library and y as number of murders, you’ll see a positive association: number of murders tends to be higher in cities with more library books.
Does that mean that reading causes people to commit murder, or that murderers read more than other people? Of course not! There is a lurking
variable here: population of the city.
When you have a positive or negative association, there are four possibilities: x might cause changes in y, y might cause changes in x, lurking
variables might cause changes in both, or it could just be coincidence, a random sample that happens to show a strong association even though the
population does not.
used by permission; source: h p://xkcd.com/552/ (accessed 2014-09-15)
BTW: If correlation is not causation, then how can we establish causation? For example, how do we know that smoking causes lung cancer in humans? Obviously we can’t
perform an experiment, for ethical reasons. Sir Austin Bradford Hill [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_BradfordHill] laid down nine criteria for
establishing causation in a 1965 paper, The Environment and Disease: Association or Causation? [see “Sources Used” at end of book] Short summaries of the “Bradford Hill
criteria” are many places on the Web, including Steve Simon’s (2000b) Causation [see “Sources Used” at end of book].
Regression Line, ŷ = ax+b
Write the equation of the line using ŷ (“y-hat”), not y, to indicate that this is a prediction. b is the y intercept, and a is the slope. Round both of them to
four decimal places, and write the equation of the line as
ŷ = 3.1661x − 55.7966
(Don’t write 3.1661x + −55.7966.)
These numbers can be interpreted pre y easily. Business majors will recognize them as intercept = fixed cost and slope = variable cost, but you can
interpret them in non-business contexts just as well.
The slope, a or b1 or m, tells how much ŷ increases or decreases for a one-unit increase in x. In this case, your interpretation is “the ball travels
about an extra 3.17 yards when the club speed is 1 mph greater.” The slope and the correlation coefficient always have the same sign. (A negative
slope would mean that y decreases that many units for every one unit increase in x.)
The intercept, b or b0, says where the regression line crosses the y axis: it’s the value of ŷ when x is 0. Be careful! The y intercept may or may
not be meaningful. In this case, a club-head speed of zero is not meaningful. In general, when the measured x values don’t include 0 or don’t at least
come pre y close to it, you can’t assign a real-world interpretation to the intercept. In this case you’d say something like “the intercept of −55.7966 has
no physical interpretation because you can’t hit a golf ball at 0 mph.
Here’s an example where the y intercept does have a physical meaning. Suppose you measure the gross weight of a UPS truck (y) with various
numbers of packages (x) in it, and you get the regression equation ŷ = 2.17x+2463. The slope, 2.17, is the average weight per package, and the
y intercept, 2463, is the weight of the empty truck.
BTW: The slope (a or m or b1) and y intercept (b or b0) of the regression line can be calculated from formulas, if you have a lot of time on your hands:
For the meaning of ∑, see ∑ Means Add ’em Up [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c01_BigSigma] in Chapter 1.
Traditionally, calculus is used to come up with those equations, but all that’s really necessary is some algebra. See Least Squares — the Gory Details
[URL: h ps://BrownMath.com/stat/leastsq.htm] if you’d like to know more.
The second formula for the slope is kind of neat because it connects the slope, the correlation coefficient, and the SD of the two variables.
Coefficient of Determination, R²
The last number to look at (third on the screen) is R², the coefficient of determination. (The calculator
displays r², but the capital le er is standard notation.) R² measures the quality of the regression line as
a means of predicting ŷ from x: the closer R² is to 1, the be er the line. Another way to look at it is Because this textbook helps you,
that R² measures how much of the total variation in y is predicted by the line. please donate at
In this case R² is about 0.88, so your interpretation is “about 88% of the variation in distance BrownMath.com/donate.
traveled is associated with variation in club-head speed.” Statisticians say that R² tells you how much
of the variation in y is “explained” by variation in x, but if you use that word remember that it means a
numerical association, not necessarily a cause-and-effect explanation. It’s best to stick with “associated” unless you have done an experiment to show
that there is cause and effect.
There’s a subtle difference between r and R², so keep your interpretations straight. r talks about the strength of the association between the variables;
R² talks about what part of the variation in the y variable is associated with variation in the x variable, and how well the line predicts y from x. Don’t
use any form of the word “correlated” when interpreting R².
Only linear regression will have a correlation coefficient r, but any type of regression — fi ing any line or curve to a set of data points — will
have a coefficient of determination R² that tells you how well the regression equation predicts y from the independent variable(s). Steve Simon
(1999b) gives an example for non-linear regression in R-squared [see “Sources Used” at end of book].
BTW: In straight-line regression, R² is the square of r, so if you want a formula just compute r and square the result.
4B4. Step 3. Display the Regression Line
Show line with original data points. [GRAPH]
What is this line, exactly? It’s the one unique line that fits the plo ed points best. But what does “best” mean?
For each plo ed point, there is a residual equal to y−ŷ, the difference between the actual measured y
for that x and the value predicted by the line. Residuals are positive if the data point is above the line, or
negative if the data point is below the line.
You can think of the residuals as measures of how bad the line is at prediction, so you want them small.
For any possible line, there’s a “total badness” equal to taking all the residuals, squaring them, and adding
them up. The least squares regression line means the line that is best because it has less of this “total
badness” than any other possible line. Obviously you’re not going to try different lines and make those
calculations, because the formulas built into your calculator guarantee that there’s one best line and this is The same four points on left and right.
The vertical distance from each measured
it.
data point to the line, y−ŷ, is called the
residual for that x value. The line on the
BTW: Carl Friedrich Gauss [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Gauss] developed the method of least
right is be er because the residuals are
squares in a paper published in 1809.
smaller.
source: Dabes and Janik (1999, 179) [see
“Sources Used” at end of book]
4B5. Optional: Display the Residuals
I would like you to know the material in this section, but it's not part of the MATH200 syllabus so I don’t require it. No homework or quiz problems
will draw from this section. You will, however, need to calculate individual residuals; see Finding Residuals, below.
“No regression analysis is complete without a display of the residuals to check that the linear model is reasonable.”
DeVeaux, Velleman Bock (1999, 227) [see “Sources Used” at end of book]
The residuals are automatically calculated during the regression. All you have to do is plot them on the y axis against your existing x data. This is an
important final check on your model of the straight-line relationship.
Turn off other plots. Press [Y=]. Cursor to the highlighted = sign next to Y1 and press [ENTER]. Cursor to
PLOT1 and press [ENTER].
Set up the plot of residuals against the x data. Set up Plot 2 for the residuals. Press [2nd Y= makes STAT PLOT]
[▼] [ENTER] [ENTER] to turn on Plot 2. Press [▼] [ENTER] to
select a sca erplot.
The x’s are still in L1, so press [2nd 1 makes L1] [ENTER]. In this
plot, the y’s will be the residuals: press [2nd STAT makes LIST],
cursor up to RESID, and press [ENTER] [ENTER].
Display the plot. [ZOOM] [9] displays the plot.
You want the plot of residuals versus x to be “the most boring sca erplot you’ve ever seen.” (DeVeaux, Velleman, Bock 2009, 203) [see “Sources
Used” at end of book] “It shouldn’t have any interesting features, like a direction or shape. It should stretch horizontally, with about the same
amount of sca er throughout. It should show no bends, and it should have no outliers. If you see any of these features, find out what the regression
model missed.”
Don’t worry about the size of the residuals, because [ZOOM] [9] adjusts the vertical scale so that they take up the full screen.
If the residuals are more or less evenly distributed above and below the axis and show no particular trend, you were probably right to choose
linear regression. But if there is a trend, you have probably forced a linear regression on non-linear data. If your data points looked like they fit a
straight line but the residuals show a trend, it probably means that you took data along a small part of a curve.
Here there is no bend and there are no outliers. The sca er is pre y consistent from left to right, so you conclude that distance traveled versus club-
head speed really does fit the straight-line model.
Residual Plot Showing Problems
Refer back to the sca erplot of f/stop against shu er speed. I said then that it was not a straight line, so you could not do a linear regression. If you
missed the bend in the sca erplot and did a regression anyway, you’d get a correlation coefficient of r = 0.98, which would
encourage you to rely on the bad regression. But plo ing the residuals (at right) makes it crystal clear that linear regression
is the wrong type for this data set.
This is a textbook case (which is why it was in a textbook): there’s a clear curve with a bend, variation on both sides of
the x axis is not consistent, and there’s even a likely outlier.
Optional Advanced: Residuals and R²
I said in Step 2 that the coefficient of determination measures the variation in measured y that’s associated with the variation in measured x. Now
that you understand the residuals, I can make that statement more precise and perhaps a li le easier to understand.
The set of measured y values has a spread, which can be measured by the standard deviation or the variance. It turns out to be useful to consider the
variation in y’s as their variance. (You remember that the variance is the square of the standard deviation.)
The total variance of the measured y’s has two components: the so-called “explained” variation, which is the variation along the regression line, and
the “unexplained” variation, which is the variation away from the regression line. The “explained” variation is simply the variance of the ŷ’s,
computing ŷ for every x, and the “unexplained” variation is the variance of the residuals. Those two must add up to the total variance of the
measured y’s, which means that as percentages of the variation in y they must add to 100%. So R² is the percent of “explained” variation in the
regression, and 100%−R² is the percent of “unexplained” variation.
and
Now I can restate what you learned in Step 2. R² is 88% because 88% of the variance in y is associated with the regression line, and the other 12%
must therefore be the variance in the residuals. This isn’t hard to verify: do a 1-VarStats on the list of measured y’s and square the standard deviation
to get the total variance in y, s²y = 59.93. Then do 1-VarStats on the residuals list and square the standard deviation to get the “unexplained” variance,
s²e = 7.12. The ratio of those is 7.12/59.93 = 0.12, which is 1−R². Expressing it as a percentage gives 100%−R² = 12%, so 12% of the variation in measured
y’s is “unexplained” (due to lurking variables, measurement error, etc.).
Summary: The regression line represents the model that best fits the data. One important reason for doing the regression in the first place is to
answer the question, what average y value does the model predict for a given x? This page shows you two methods of answering that
question.
See also: A separate version of these instructions for the TI-89

A separate version of these instructions for Excel (later in this chapter)
4C1. Method 1: Trace on the Regression Line Graph (preferred)
You can make predictions while examining the graph of the regression line on the TI-83/84 or TI-89.
Advantages to this method: aside from being pre y cool, it avoids rounding errors, and it’s very fast for multiple predictions.
Activate tracing on the regression line. [TRACE]
Look in the upper left corner to make sure that the If you see P:L1,L2, press [▲] to display the regression equation.
regression equation is displayed.
Press the black-on-white numeric keys including [(−)] and decimal point if needed.
As soon as you press the first number, you’ll see a large X= appear at the bo om left of
the screen. Enter any additional digits and press [ENTER].
Enter the x value. The TI-83/84 displays the predicted average y value (ŷ) at the bo om right and puts a
blinking cursor at that point on the regression line.
Caution: ŷ = 267.1 yards is the predicted or expected average distance for a club-head speed of 102 mph. But that does not mean any particular
golf ball hit at that speed will travel that exact distance. You can think of ŷ as the average travel distance that you’d would expect for a
whole lot of golf balls hit at that speed.
Extrapolation: Just Say No (Usually)
Caution: A regression equation is valid only within the range of actual measured x values, and a li le way left and right of that range. If you try to
go too far outside the valid range, the calculator will display ERR:INVALID.
It’s not just being cranky. The line describes the points you measured, so it’s usable between your minimum and maximum x values and maybe
a li le way outside those limits. But unless you have very solid reasons why the same straight-line model is good beyond that range, you can’t
extrapolate.
Take a look at this graph of men’s and women’s winning times in the Olympic 100-meter dash from 1928 to 2012, which I made from data
compiled by Mike Rosenbaum [see “Sources Used” at end of book]. (The women’s 100 m dash became an Olympic event in 1928.)
From this you can reasonably guess that if women had run in the 1924 Olympics, the winner would have finished in around 12.2 or 12.3 seconds.
And the 2016 winner will probably finish in around 11.5 seconds. But the further you go outside your measured data, the more riskier your
predictions.
Will men’s and women’s times generally continue to decrease? Probably: training will get be er, nutrition will improve, global communications
will make it less likely that a stellar runner goes undiscovered. But will the decrease follow a straight line? Certainly not! Think about it for a minute.
If times keep decreasing on a straight line, eventually they’ll cross the x axis and go negative. Runners will finish the race before they start it! So
obviously the straight-line model breaks down — the only question is where. You don’t know, and you can’t know. All you know is that it’s not safe to
extrapolate.
Bogus extrapolations give statistics a bad name and make people say “you can prove anything with statistics.” Here’s an example. I’ve just extended
the two trend lines to “prove” that after the 2160 Olympics women will run the 100 meters faster than men. Pre y clearly, the linear model breaks
down before then.
It’s not safe to extrapolate to earlier times, either. The intercepts tell you that in the year zero, the fastest man in the world took 31.6 seconds to run
100 m, and the fastest woman took 44.7 seconds. Does that seem believable?
4C2. Method 2: Use Calculated Regression Equation (if necessary)
But what if you don’t still have the regression line on your calculator, for instance if you’ve done a different regression? In that case, you can go back
to your wri en-down regression equation and plug in the desired x value.
Advantage of this method: You already know how to substitute into equations. Disadvantages: depending on the specific numbers involved,
you may introduce rounding errors. Also, since you’re entering more numbers there’s an increased chance of entering a number wrong.
Example: To find the predicted average y value for x = 102, go back to the regression equation that you wrote down, and substitute 102 for x:
ŷ = 3.1661x − 55.7966
ŷ = 3.1661*102 − 55.7966
ŷ = 267.1456 → 267.1
In this example, the rounding error was very small, and it disappeared when you rounded ŷ to one decimal place. But there will be problems where
the rounding error is large enough to affect the final answer, so always use the trace method if you can.
Again, please observe the Cautions above. With this method, the calculator won’t tell you when your x value is outside a reasonable range, so you
need to be aware of that issue yourself.
4C3. Finding Residuals
Each measured data point has an associated residual, defined as y−ŷ, the distance of the point above or below the line. To find a residual, the actual y
comes from the original data, and the predicted average ŷ comes from one of the methods above.
Example: Find the residual for x = 102.

Solution: From the original data, y = 264. From either of the methods above, ŷ = 267.1. Therefore the residual is y−ŷ = 264−267.1 = −3.1 yards.
If a given x value occurs in more than one data point, you have multiple residuals for that x value.
4D. Decision Points for Correlation Coefficient
Summary: After you compute the linear correlation coefficient r of your sample, you may wonder whether this reflects any linear correlation in
the population. By comparing r to a critical number or decision point, you either conclude that there is linear correlation in the
population, or reach no conclusion. You can never conclude that there’s no correlation in the population.
BTW: This page gives a simple mechanical test, but a proper statistical test exists. The optional advanced handout Inferences about Linear Correlation [URL:
h ps://BrownMath.com/stat/correl.htm#DecisionPoints] explains how decision points are computed and the theory behind the test. You need to learn about t tests before you
can understand all of it, but right now you can use the Excel spreadsheet that you’ll find there. Or you can use MATH200B Program part 6 to do the computations.
4D1. Procedure
The decision points are used to answer the question “From the linear correlation r of my sample, can I rule out chance as an explanation for the
correlation I see? Can I infer that there is some correlation in the population?”
To answer that question, temporarily disregard the sign of r. This is the absolute value of r, wri en | r |. Then compare | r | to the decision
point, and obtain one of the only three possible results:
If | r | ≤ d.p. If | r | > d.p.
... and r is negative ... and r is positive
... then you cannot say whether there is any linear ... then there is some negative linear ... then there is some positive linear
correlation in the population. correlation in the population. correlation in the population.
Here’s a table of decision points (also known as critical values of r) for various sample sizes.
Decision Points or Critical Numbers for r

(two-tailed test for ρ≠0 at significance level 0.05)
n d.p. n d.p. n d.p. n d.p. n d.p.
5 .878 10 .632 15 .514 20 .444 30 .361
6 .811 11 .602 16 .497 22 .423 40 .312
7 .754 12 .576 17 .482 24 .404 50 .279
8 .707 13 .553 18 .468 26 .388 60 .254
9 .666 14 .532 19 .456 28 .374 80 .220
100 .196
(If your sample size is not shown, either refer to the Excel workbook [URL: h ps://BrownMath.com/stat/correl.htm#workbook] or use the next lower
number that is shown in the table. Example: n = 35 is not shown, and therefore you will use the decision point for n = 30.)
4D2. Examples
You survey 50 randomly selected college students about the number of hours they spend playing video games each week and their GPA, and you
find r = −0.35. You look up n = 50 in the table and find 0.279 as the decision point. |r|>d.p. (0.35 > 0.279). You conclude that for college students in
general, video game play time is negatively associated with GPA, or that GPA tends to decrease as video-game playing increases.
You randomly select 21 college students. For the amount they spend on textbooks and their GPA, you find r = +0.20. n=21 isn’t in the table of decision
points, so you select 0.444, the decision point for n=20. |r|≤d.p. (0.20 ≤ 0.444). Therefore, you are unable to make any statement about an association
between textbook spending and GPA for college students in general.
4D3. Interpretation
Be very careful with your interpretation, and don’t say more than the statistics will allow.
The question was simply whether there is some correlation in the population, not how much. The population might have stronger or weaker
correlation than your sample; all you know is that it has some. (Though you won’t learn how to do it in this course, it is possible to estimate the
correlation coefficient of the population [URL: h ps://BrownMath.com/stat/correl.htm#CI].)
If you conclude there is some correlation in the population, it’s probable, not certain. From a completely uncorrelated population, there’s still one
chance in 20 of drawing a sample with | r | greater than the decision point. Because 1/20 is .05, we say that .05 is the significance level.
Even if you conclude that there is some correlation in the population, that’s the start of your investigation, not the end. If there’s a correlation in the
population, you can’t just assume that one variable drives the other: correlation is not causation. Steve Simon’s (2000b) Causation [see “Sources
Used” at end of book] gives some hints for investigating causation, using smoking and lung cancer as an example.)
Finally, note that there’s no way to reach the conclusion “there’s no correlation in the population." Either there (probably) is, or you can’t reach any
conclusion. This will be a general pa ern in inferential statistics: either you reach a conclusion of significance, or you don’t reach any conclusion at
all. (As you’ll see in Chapter 10, you can conclude “something is going on”, you can fail to reach a conclusion, but you can never conclude “nothing is
going on”. Lack of evidence for is not evidence against.)
4E. Optional: Scatterplot, Correlation, and Regression in Excel
Summary: In “Sca erplot, Correlation, and Regression on TI-83/84”, earlier in this chapter, you learned the concepts of correlation and regression,
and you used a TI-83 or TI-84 calculator to plot the points and do the computations. The calculator is handy, but calculator screens
aren’t great for formal reports. This section tells you how to do the same operations in Microsoft Excel, without repeating the concepts.
I’m using Excel 2010, but Excel 2007 or 2013 should be almost identical.
4E1. Plot the Points
Here again are the data:
Club-head speed, mph (x) 100 102 103 101 105 100 99 105
Distance, yards (y) 257 264 274 266 277 263 258 275
1. Enter the x-y pairs in rows or columns; row or column heads are optional.
2. With your mouse, highlight the data but not the headers. Click Insert. In the Charts
section, click Sca er and choose the first sca erplot type.
3. Right-click the useless “Series1” legend and click Delete.
4. This time I got lucky, but sometimes Excel puts too much white space at the left or
bo om of the chart. If this happens to you, right-click the axis numbers and select
Format Axis. Change Minimum to Fixed and type in a sensible value.
5. In the Excel ribbon, click Layout » Axis Titles » Primary Horizontal Axis Title »
Title Below Axis and type the axis title. Include units if any. In this case, you have club-
head speed in miles per hour.
6. Click Axis Title » Primary Vertical Axis Title » Rotated Title and type the axis title, including units if any. In this case, you have distance traveled in
yards.
7. Click Chart Title » Above Chart and type your chart title.
8. For a neater appearance, you can right-click the horizontal axis, select Format Axis, and change Major tick mark type to None. Repeat for the
vertical axis. Your chart should look like this:
4E2. Show the Regression Line
1. In the Excel ribbon, click Layout. In the Analysis group, click Trendline » More Trendline Options.
2. In the dialog box that appears, click Trendline Options at the left. At the top right, select Linear. At the bo om right, select
Display Equation on chart and Display R-squared value on chart.
3. Click and drag the regression equation and R² value so that they’re not covering any data points or any part of the line. Then right-click them
and select Format Trendline Label. Click Fill at the left, then at the right click Solid Fill and change the color to white. (This keeps the gridlines
from running through the text.) If you wish, click Border Color » Solid Line. Here’s the result:
4E3. Show the Correlation Coefficient
Excel won’t put r on the chart, but you can compute it in a worksheet cell:
1. Click into an empty worksheet cell. Type =CORREL( including the = sign and opening parenthesis.
2. Highlight your y list with your mouse — numbers only, not the header — and type a comma.
3. Highlight your x list with your mouse — again, just the numbers. Type a closing parenthesis and press [ENTER].
(You can get the slope, y intercept, or R² into the worksheet by following the above procedure but substituting SLOPE, INTERCEPT, or RSQ for
CORREL.)
4E4. Predict the Average y
Like your calculator, Excel can find the ŷ value (predicted average y) for any x.
Caution: A regression equation is valid only within the range of actual measured x values, and a li le way left and right of that range. If you
go outside that range, Excel will happily serve up garbage numbers to you.
On average, how far do you expect a golf ball to travel when hit at 102 mph?
1. Type your x value, 102, in an empty cell.
2. Click into an empty worksheet cell. Type =FORECAST( including the = sign and opening parenthesis.
3. Click into the cell that holds your x value, and type a comma.
4. Highlight your y list with your mouse — numbers only, not the header — and type a comma.
5. Highlight your x list with your mouse — again, just the numbers. Type a closing parenthesis and hit [ENTER]. You’ll see the predicted
average distance, 267.1 yards.
The prediction formula, like all Excel formulas, is “live”: if you type in a new x Excel will display the corresponding ŷ. If this doesn’t happen, in the
Excel ribbon click Formulas » Calculation Options » Automatic.
Make sca erplot on calculator to decide whether to perform regression.
Compute linear correlation coefficient r and best-fi ing line ŷ = ax+b on calculator.
Interpret r, R², slope, and intercept. Caution! r is about correlation, but R² is not.
Use the regression line to make and interpret predictions about ŷ. Remember, you are predicting an average. Caution! Don’t
extrapolate.
Compute residuals. If you have several data points with the same x and different y’s, you have several residuals for that x.
Determine whether there’s a linear relation in the population. Don’t make hand-waving arguments; use decision points.

please donate at
BrownMath.com/donate.
do it after all.
A researcher performed a regression on x = age and y = salary for all employees at MegaGrandeEnormoCorp (doing business as “Gramma’s
1 Kitchen”). She found R² = 0.64. How would you explain this to a friend who doesn’t understand any math more complicated than percentages?
Manatees or “sea cows” are large, slow-moving mammals that live in coastal waters. They’re an endangered
2 species. Sharyn O’Halloran (n.d., slide 4) [see “Sources Used” at end of book] quotes yearly figures from the
Year Power Boat
Reg. (1000s)
Manatees
Killed
US Fish & Wildlife Service for the number of power-boat registrations and number of manatees killed by power 1977 447 13
boats in Florida coastal waters. 1978 460 21
(a) The two variables are power-boat registrations and manatee deaths. Which should be the explanatory
1979 481 24
variable, and which should be the response variable?
1980 498 16
(b) On paper or on your calculator, make a sca erplot. Do the data seem to follow a straight line, more or
less? 1981 513 24
(c) Give the symbol and numerical value of the correlation coefficient. 1982 512 20
(d) Write down the regression equation for manatee deaths as a function of power-boat registrations. 1983 526 15
(e) State and interpret the slope. 1984 559 34
(f) State and interpret the y intercept. 1985 585 33
(g) Give the coefficient of determination with its symbol, and interpret it. 1986 614 33
(h) How many deaths does the regression predict if 559,000 power boats are registered? Use the proper
1987 645 39
symbol.
1988 675 43
(i) Find the residual for x = 559.
(j) How many manatee deaths would you expect for a million power-boat registrations? 1989 711 50
1990 719 47
1991 716 53
1992 716 38
1993 716 35
1994 735 49
Sascha randomly selected 10 TC3 students and asked how many hours of TV they watched on an average day and what was their GPA. The
3 correlation was −0.57. What if anything can you say about TV watching and GPA for all TC3 students?
Your deep freezer has a dial to regulate temperature, but it’s just numbered 0 to 8 with no indication of temperature. So you
4 try various dial se ings, allowing 24 hours for temperature to stabilize after each change. The results are shown at right.
Dial Temp, °F
0 6
(a) Make a sca erplot. Does a straight-line model seem reasonable here? 2 −1
(b) What linear equation best describes the relation between dial se ing x and temperature y?
3 −3
(c) State and interpret the slope.
5 −10
(d) State and interpret the y intercept.
(e) Give the correlation coefficient with its symbol. 6 −16
(f) Give the coefficient of determination with its symbol, and interpret it.
(g) Predict the temperature for a dial se ing of 1.
A statistics professor asked students to write on their final exam the number of hours they had spent studying. After scoring the exams, she
5 randomly selected 12 of them and plo ed exam score against hours of study, with the result r = 0.85. What if anything can you say about the
relation between study time and exam score for statistics students in general, assuming that this class is representative of all classes?
A public-school administrator with too much time on his hands studied shoe size and reading ability and found a correlation coefficient of 0.81.
6 Are big feet a sign of intelligence?
A sca erplot is shown at right. Would the value of r be strongly positive, near zero, or strongly negative? Briefly explain
7 your answer.
“In a large study of twins, the Minnesota Twin study found a correlation of +.71 between the IQ scores of identical twins. Another study found
8 that family income is correlated +.30 with the IQ of children.” (Source: Pearson’s 2001 [see “Sources Used” at end of book] in the McGraw-Hill
Statistical Primer.)
How much of the variation in children’s IQ is associated with variation in family income?
5. Probability
Updated 13 Jan 2015
Intro: By now you know: There’s no certainty in statistics. When you draw a sample from a population, what you get is a ma er of
probability. When you use a sample to draw some conclusion about a population, you’re only probably right. It’s time to learn just
how probability works.
Contents: 5A. Probability Basics

5A1. What Is Probability?
5A2. Where Do You Get Probabilities?
5A3. Interpreting Probability Statements
5A4. Law of Large Numbers
5A5. Sample Space
5A6. Probability Models
5B1. Probability “or” for Disjoint Events
5B2. Probability “or” for All Events
5B3. Probability “not” — Complements
5B4. Probability “and” for Independent Events
5B5. Probability “at least” for Independent Events
5B6. Conditional Probability
· Optional: Conditional Probability Formula
5B7. Optional: Checking Independence
5B8. Optional: Probability “and” for All Events
Problem Set 1
Problem Set 2
If you’re learning independently, you can skip the sections marked “Optional” and still understand the chapters that follow. If you’re taking this
course with an instructor, s/he may require some or all of those sections. Ask if you’re not sure!
For easy reference, tables used in more than one problem are duplicated at the end of this document.
5A. Probability Basics
5A1. What Is Probability?
De initions: Probability can be de ined two ways: the long-term relative frequency of an event, or the likelihood that an event will occur.
A trial is any procedure or observation whose result depends at least partly on chance. The result of a trial is called the outcome. We
call a group of one or more repeated trials a probability experiment.
Example 1: Ten thousand doctors took aspirin every night for six years, and 104 of them had heart a acks. The relative frequency is 104/10000 =
1.04%, so the probability of heart a ack is 1.04% for doctors taking aspirin nightly.
Each doctor represents a trial, and the outcome of each trial is either “heart a ack” or “no heart a ack”. The group of 10,000 trials is a probability
experiment.
Definition: An event is a group of one or more possible outcomes of a trial. Usually those outcomes are related in some way, and the event is
named to reflect that.
Example 2: If you draw a card from a deck without looking, there are 52 possible outcomes (assuming the jokers have been removed). “Ace” is an
event, representing a group of four outcomes, and the probability of that event is 4/52 or 1/13. “Spade” is an event, representing a group of 13
outcomes, so its probability is 13/52 or 1/4. “Ace of spades” is both an outcome and an event, with a probability of 1/52.
Write probabilities as fractions, decimals, or percentages, like this:

P(event) = number
Example 3: On a coin flip, P(heads) = 0.5, read as the probability of heads is 0.5. “P(0.5)” is wrong. Don’t write P(number); always write
P(event) = number.
All probabilities are between 0 and 1 inclusive. A probability of 0 means the event is impossible or cannot happen; a probability of 1 means the
event is a certainty or will definitely happen. Probabilities between 0 and 1 are assigned to events that may or may not happen; the more likely the
event, the higher its probability.
Definition: When an event is unlikely — when it has a low probability of occurring — you call it an unusual event. Unless otherwise stated,
“unlikely” means that the probability is below 0.05.
This will be an important idea in inferential statistics.
5A2. Where Do You Get Probabilities?
Pure thought is enough to give many probabilities: the probability of drawing a spade from a deck of cards, the probability of rolling doubles three
times in a row at Monopoly, the probability of ge ing an all-white jury pool in a county with 26% black population. Any such probability is called a
theoretical probability or classical probability.
Theoretical probabilities come ultimately from a sample space, usually with help from some of the laws for combining events. (I’ll tell you about
both of these later in this chapter.)
Example 4: A standard die (used in Monopoly or Yah ee) has six faces, all equally likely to come up. Therefore you know that the probability of
rolling a two is 1/6.
On the other hand, some probabilities are impossible to compute that way, because there are too many variables or because you don’t know enough:
the probability that weather conditions today will give rise to rain tomorrow, the probability that a given radium nucleus will decay within the next
second, the probability that a given candidate will win the next election, the probability that a driver will have a car crash in the next year. To find the
probability of an event like that, you do an experiment or rely on past experience, and so it is called an experimental probability or empirical
probability.
Example 5: The CDC says that the incidence of TB in the US is 5 cases per 100,000 population. 5/100,000 = 0.005%. Therefore you can say that the
probability a randomly selected person has TB is 0.005%.
These two terms describe where a probability came from, but there’s no other difference between experimental and theoretical probabilities. They
both obey the same laws and have the same interpretations.
You probably don’t need formulas, but if you want them here they are:
Theoretical or classical: P(success) = N(success) / N(possible outcomes)
Empirical or experimental: P(success) = N(success) / N(trials)
5A3. Interpreting Probability Statements
Every probability statement has two interpretations, probability of one and proportion of all. You use the interpretation that seems most useful in a
given situation.
Example 6: For doctors taking aspirin nightly, P(heart a ack in six years) = 1.04%. The “probability of one” interpretation is that there’s a 1.04%
chance any given doctor taking aspirin will have a heart a ack. The “proportion of all” interpretation is that 1.04% of all doctors taking aspirin can be
expected to have heart a acks.
Which interpretation is right? They’re both right, but in a given situation you should use the one that feels more natural.
5A4. Law of Large Numbers
You know that P(boy) is about 50% for live births, but you’re not surprised to see families with two or three girls in a row. Probability is long-term
relative frequency; it can’t predict what will happen in any particular case.
This is expressed in the law of large numbers: as you take more and more trials, the relative frequency tends toward the true probability.
BTW: The law of large numbers was stated in 1689 by Jacob Bernoulli [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Bernoulli].
Example 7: For just a few babies, say the four children in one family, it’s quite common to find a proportion of boys very different from 50%, say one
in four (25%) or even zero in four. But consider a class of thirty statistics students. The proportion may still be different from 50%, but a very different
proportion (more than 70%, say, or less than 30%) would be unusual. And when you look at all babies born in a large hospital in a year, experience
tells you that the proportion will be very close to 50%. The more trials you take, the closer the relative frequency is to the true probability — usually.
But the Law of Large Numbers says that the relative frequency tends to the true probability. Probability can’t Heads
Trial Result rel. freq.
predict what will happen in any given case. The idea that a particular outcome is “due” is just wrong, and it’s so far
such a classic mistake that it has a name. The Gambler’s Fallacy is the idea that somehow events try to match 1 T 0 0.0000
probabilities.
2 H 1 0.5000
Example 8: I’ve just flipped a coin a few times, and the results are shown at the right. The first flip was a tail,
and after that flip the relative frequency (rf) of heads is 0. The next flip is a head, and after two flips I’ve had 3 H 2 0.6667
one head out of two trials, so the rf is 0.5. The third flip is also a head, so now the rf is 2/3 or about 0.6667. At 4 H 3 0.7500
this point someone might say, “you’re due for a tail, to move the rf back toward the true probability of 0.5.”
That’s the Gambler’s Fallacy. 5 H 4 0.8000
The coin doesn’t know what it did before, and it doesn’t try to make things “right”. In my trials, the
6 T 4 0.6667
fourth flip moves the rf of heads further from 0.5, and the fifth flip moves it further still. True, the sixth flip
moves the rf of heads closer to 0.5, but it could just as well have moved it further away, even if the coin is
perfectly fair.
I stopped after six trials. I know that if I went on to do ten trials, or a hundred, or a thousand, over time the proportion of heads would almost
always move closer to 0.5 — not necessarily on any particular flip, but in the long run.
Subconsciously you expect random events not to show a pa ern, but you may see pa erns along the way. For example, if you flip a fair coin
repeatedly, inevitably you will see a run of ten heads or ten tails — about twice in every thousand sequences of ten. If you flip the coin once every
two seconds, you can expect to see a run of ten flips the same about once every 17 minutes, on average.
Here are two more examples of pa erns cropping up in processes that are actually random:
Clustering Illusion [see “Sources Used” at end of book] at Wikipedia.
The “hot hand” illusion in basketball: see Gilovich, Vallone, Tversky (1985) [see “Sources Used” at end of book].
Example 9: You have flipped a coin 999 times, and there were 499 heads and 500 tails. What’s the probability of a head on the next flip?
Solution: It is 50%, the same as on any other flip. The Law of Large Numbers tells you that over time you tend to get closer and closer to 50%
heads, but it doesn’t tell you anything at all about any particular flip. If you think that the coin is somehow “due for a head”, you’ve fallen into the
Gambler’s Fallacy.
5A5. Sample Space
At bo om, probability is about counting. Empirical probability is the number of times something did happen, divided by the number of trials.
Classical probability is similar, but it makes use of a list or table of all possible outcomes, called a sample space. Technically a sample space is just a
list of all possible outcomes, but it’s only useful if you make it a list of all possible equally likely outcomes.
For repeated independent trials — flipping multiple coins, rolling multiple dice, making successive bets at roule e, and so on — the size of the
sample space will be the number of outcomes in each trial, raised to the power of the number of trials. For example, if you want to compute
probabilities for the number of girls in a family of four children, your sample space will have 24 = 16 entries.
Example 10: If you roll two dice, what’s the probability you’ll roll a seven? You could list the sample space as
S = { 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }
but the outcomes are not equally likely. There’s only one way to get a twelve, for instance (double sixes), but there are several ways to get a seven (1–
6, 2–5, and so on). So it’s much more useful to list your sample space with equally likely outcomes.
When constructing a sample space, be systematic so that you don’t leave any out or list any twice. Here, you’re rolling two dice, and each die has six
equally likely results, so you have 6×6 = 36 equally likely outcomes in your sample space. How can you be systematic? List the outcomes in some
regular order, like the picture below. Each row lists all the possibilities with the same outcome for the first die; each column lists all the possibilities
with the same outcome for the second die.
image courtesy of Bob Yavits, Tompkins Cortland Community College
Once you have a sample space of equally likely outcomes, finding the probability is simple. There are six ways to roll a seven: 6-1, 5-2, 4-3, 3-4, 2-5, 1-
6. There are 36 possible outcomes, all equally likely. Therefore the probability of rolling a seven is 6/36 or 1/6 or about 0.1667. In symbols, P(7) = 6/36
or P(7) = 1/6.
Presenting There’s no need to reduce fractions to lowest terms. If a decimal is not exactly equal to a fraction, it’s probably better to keep the fraction. But if
numbers: the fraction is complex or you’re comparing fractions, round to four decimal places and use the “approximately equal” sign, like this:
P(7) ≈ 0.1667
Caution: Round your final answer only. Never use a rounded number in further calculations; that’s the Big No-no. Fortunately, your
calculator makes it easy to chain calculations so that you can see rounded numbers but it still uses the unrounded numbers for further
calculations.
Example 11: Find the probability of rolling craps (two, three, or twelve).
Solution: There’s one way to roll a two, two ways to roll a three, and one way to roll a twelve. P(craps) = (1+2+1)/36 = 4/36 or 1/9.
5A6. Probability Models
Often, it’s not practical to construct a sample space and compute probabilities from it. Instead, you construct a probability model. Probability models
are yet another kind of mathematical model [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c04_model_root] as introduced in Chapter 4.
Definition: A probability model is a table showing all possible outcomes and their probabilities. Every probability must be 0 to 1 inclusive,
and the total of the probabilities must be 1 or 100%.
A probability model can be theoretical or empirical.
Example 12: Construct a probability model for the number of heads that appear when you flip two coins. Number of Heads
Solution: Start by constructing the sample space. Remember that you need equally likely events if you are going to find on Two Coin Flips
probabilities from the sample space. The first coin can be heads or tails, and whatever the first coin is, the second coin can also x P(x)
be heads or tails. So the sample space has 2×2 = 4 outcomes:
S = { HH, HT, TH, TT } 0 1/4
There are four equally likely outcomes, so the denominator (bo om number) on all the probabilities will be 4. The possible
1 2/4
outcomes are no heads (one way), one head (two ways), and two heads (one way). The probability model is shown at right.
Often a total row is included, as I did, to show that the probabilities add up to 1. 2 1/4
That was an easy example, so easy that you could just as well work from the sample space. But think about more complex ∑ 4/4 = 1
situations, especially with empirical (experimental) probabilities. Constructing a sample space may be impractical, but a
probability model is relatively easy to create.
Example 13: (adapted from Sullivan 2011, page 235 problem 40): The CDC asked college students how often they wore a seat belt when driving. 118
answered never, 249 rarely, 345 sometimes, 716 most of the time, 3093 always. Construct a probability model for seat-belt use by college students
when driving.
Solution: Probability of one is proportion of all, so to get the probabilities you simply calculate the proportions. Sample size was
(118+249+345+716+3093) = 4521. The proportions or probabilities are then simply 118/4521, 249/4521, and so on. The probability model is shown at the
right.
Comments: Don’t push this model too far. In this sample, 68.4% of college students reported that they always use a seat belt when driving.
There’s no uncertainty about that statement; it’s a completely accurate statistic (summary number for a sample). But can you go further and say that
68.4% of college students always wear a seat belt when driving? No, for two reasons.
First, this is a sample. Even if it’s a perfect random sample, it’s still not the population. There’s always sample
variability. A different sample of college students would most likely give different answers — probably not very Seat-Belt Use by
different, since this was a large sample, but almost certainly not identical. Second, and more serious, this survey College Students Driving
depended on self reporting: students weren’t observed, they were just asked. When people report their behavior they (sample size: 4521)
tend to shade their responses in the direction of what’s socially approved or what they would like to think about Never 2.61 %
themselves (response bias). How many of those “always” responses should have been “most of the time” or Rarely 5.51 %
“sometimes”? You have no way to know. Sometimes 7.63 %
Most of 15.84 %
the time
Always 68.41 %
Total 100.00 %
You can find probabilities of simple events by making sample spaces and counting. But life isn’t usually that simple. To find probabilities of more
interesting (and complex) events, you need to use rules for combining probabilities.
The rules are the same whether your original probabilities are theoretical or experimental.
5B1. Probability “or” for Disjoint Events
Definition: When two events can’t both happen on the same trial, they are called mutually exclusive events or disjoint events.
Example 14: You select a student and ask where she was born. “Born in Cortland” and “born in Ithaca” are mutually exclusive events because they
can’t both be true for the same person.
Comment: Obviously it’s possible that neither is true. Disjoint events could both be false, or one might be true, but they can’t both be true in the
same trial.
Example 15: You select a student and ask his major. “Major in physics” and “major in music” are non-disjoint events because they could be true of
the same student. (It doesn’t ma er whether they are both true of the student you asked. They are non-disjoint because they could both be true of the
same student — think about double majors.)
Rule: For disjoint events, P(A or B) = P(A)+P(B)

Example 16: You draw a card from a standard 52-card deck. What’s P(ace or face card)? (A face card is a king, queen, or jack.)
Solution: Are the events “ace” and “face card” disjoint? Yes, because a given card can’t be both an ace and a face card. Therefore you can use the rule:
P(ace or face card) = P(ace) + P(face card)
But what are P(ace) and P(face card)? A picture may help.
used by permission; source: h p://www.jfi .com/cards/ accessed 2012-09-26
Now you can see that the deck of 52 cards has four aces and twelve face cards. Therefore
P(ace) = 4/52 and P(face card) = 12/52
Since the events are disjoint,
P(ace or face card) = P(ace) + P(face card)
P(ace or face card) = 4/52 + 12/52 = 16/52
Reminder: When you need to compute probability of A or B, always ask yourself first, are the events disjoint? Use the simple addition rule only if
the events are disjoint. If events are non-disjoint — if it’s possible for both to happen on the same trial — you have to use the general rule, below.
Take a look at this table of marital status in 2006, from the US Census Bureau. It’s known as a contingency table or two-way table, because it
classifies each member of the sample or population by two variables — in this case, sex and marital status.
Example 17: What’s the probability that a randomly selected person is widowed or divorced?
Solution: Are those events disjoint? Yes, because a given person can’t be listed in both rows of the table. (You might argue that a given person
can be both widowed and divorced in his or her lifetime, and that’s true. But the table shows marital status at the time the survey was made, not over
each person’s lifetime. The “Widowed” row counts those whose most recent marriage ended with the death of their spouse.) Therefore
P(widowed or divorced) = P(widowed) + P(divorced)
How do you find those probabilities? Remember that probability of one = proportion of all. Find the US Marital Status in 2006 (in Millions)
proportions, and you have the probabilities. Men Women Totals
P(widowed or divorced) = 13.9/219.7 + 22.8/219.7 Married 63.6 64.1 127.7
P(widowed or divorced) = 36.7/219.7 ≈ 0.1670 Widowed 2.6 11.3 13.9
Divorced 9.7 13.1 22.8
Example 18: Find the probability that a randomly selected man is widowed or divorced.
Solution: Disjoint events? Yes, a given man can’t be in both rows of the table. Again, the Never married 30.3 25.0 55.3
probabilities are the proportions, but now you’re looking only at the men: Totals 106.2 113.5 219.7
P(widowed or divorced) = P(widowed) + P(divorced)
P(widowed or divorced) = 2.6/106.2 + 9.7/106.2
P(widowed or divorced) = 12.3/106.2 ≈ 0.1158
Now let’s look at a couple of examples of probability “or” for non-disjoint events.
Example 19: Find P(seven or club).

Solution: Are the events “seven” and “club” disjoint? No, because a given card can be both a seven and a club. You can’t use the simple addition
rule.
The next section shows you a formula, but in math there’s usually more than one way to approach a problem. Here you can look at the picture
(reprinted on the last page) and count from the sample space. There are thirteen clubs, plus the sevens of spades, hearts, and diamonds, for a total of
16. (You don’t count the seven of clubs when counting sevens, because you already counted it when counting clubs.) And therefore P(seven or club) =
16/52.
Example 20: Find P(woman or divorced).

Solution: Disjoint events? No, a given person can be both. So what do you do? The same thing as in the preceding example: you count up all the
women, and all the divorced people who aren’t women, and divide by the number of people:
P(woman or divorced) = 113.5/219.7 + 9.7/219.7 = 123.2/219.7 ≈ 0.5608
5B2. Probability “or” for All Events
Look back at P(seven or club). Those are not disjoint events, so you can’t just add P(seven) and P(club).
But what did you do, when counting? You counted the clubs, then you counted the sevens that aren’t Because this textbook helps you,
clubs. In other words, just adding P(seven) and P(club) would be wrong because that would double please donate at
count the overlap. BrownMath.com/donate.
With 52 cards, it’s easy enough just to count. But that’s not practical in every problem, so there’s a
rule: go ahead and double count by adding the probabilities, then fix it by subtracting the part you
double counted.
Rule: P(A or B) = P(A) + P(B) − P(A and B)

This general addition rule works for all events, disjoint or non-disjoint. (If two events are disjoint, they can’t happen at the same time, P(A and B) is
0, and the general rule becomes the same as the simple rule.)
Let’s redo the last two examples with this new general rule, to see that it gives the same answers.
Example 19 again: Find P(seven or club).

P(seven or club) = P(seven) + P(club) − P(seven and club)
Caution: P(seven and club) doesn’t mean “all the sevens and all the clubs”. It means the probability that one card will be both a seven and a club — in
other words, it means the seven of clubs.
P(seven or club) = 4/52 + 13/52 − 1/52
P(seven or club) = 16/52
Example 20 again: Using the table of marital status (reprinted on the last page), find P(woman or divorced).
Solution:
P(woman or divorced) = P(woman) + P(divorced) − P(woman and divorced)
P(woman or divorced) = 113.5/219.7 + 22.8/219.7 − 13.1/219.7
P(woman or divorced) = 123.2/219.7 ≈ 0.5608
5B3. Probability “not” — Complements
About two thirds of students who register for a math class complete it successfully. What’s the probability that a randomly selected student who
registers for a math class will not complete it successfully? Of course you already know it’s 1−(2/3) = 1/3. Let’s formalize this.
Definitions: Two events are complementary if they can’t both occur but one of them must occur. If A is an event from a given sample space, then
the complement of A, wri en AC or not A, is the rest of that sample space.
Describing a complement usually involves using the word “not”. Complementary events (can’t both happen, but one must happen) are a
subcategory of disjoint events (can’t both happen).
Example 21: The complement of the event “the student completes the course successfully” is the event “the student does not complete the course
successfully.” Obviously the complement need not be a simple event. The complement of “the student completes the course successfully” is “the
student never shows up, or a ends initially but stops a ending, or withdraws, or earns an F, or takes an incomplete but never finishes”, or probably
other outcomes I haven’t thought of.
Rule: P(AC) = 1 − P(A)

This comes directly from the definition, and the rule for “or”. A and AC can’t both happen, so they’re disjoint and P(A or AC) = P(A)+P(AC). But one or
the other must happen, so P(A or AC) = 1. Therefore P(A)+P(AC) = 1, and P(AC) = 1−P(A).
Example 22: In rolling two dice, “doubles” and “not doubles” are complementary events because they can’t both happen on the same roll, but one of
them must happen. “Boxcars” (double sixes) and “snake eyes” (double ones) can’t both happen, so they’re disjoint; but they are not complementary
because other outcomes are possible.
The complement rule is useful on its own, but it really shines as a labor-saving device. Very often when a probability problem looks like a lot of
tedious computation, the complement is your friend. This really sticks out with “at least” problems (later), but here are a few simpler examples.
Example 23: The color distribution for plain M&Ms is shown at right. What’s the probability that a randomly selected plain Colors of Plain M&Ms
M&M is any color but yellow? Blue 24 %
Solution: You could add the probabilities of the five other colors, but of course it’s easier to say Orange 20 %
P(YellowC) = 1 − P(Yellow)
Green 16 %
P(YellowC) = 100% − 14% = 86%
Yellow 14 %
Example 24: Referring again to the table of marital status (reprinted on the last page), what’s the probability that a Brown 13 %
randomly selected person is not currently married? Red 13 %
Solution: Since the four marital statuses are disjoint, you could add the probabilities for widowed, divorced, and
never married. But it’s easier to take the complement of “married”:
P(not currently married) = P(marriedC)
P(not currently married) = 1 − P(married)
P(not currently married) = 1 − 127.7/219.7
P(not currently married) = 0.4188
5B4. Probability “and” for Independent Events
Definition: Two events are called independent events if the occurrence of one doesn’t change the probability of the other.
Example 25: When you play poker, being dealt a pair in this hand and a pair in the next are independent events because the deck is shuffled between
hands. But in casino blackjack, according to Scarne on Cards (Scarne 1965, 144 [see “Sources Used” at end of book]), four decks are used and they
aren’t necessarily shuffled between hands. Therefore, ge ing a natural (ace plus a ten or face card) in this hand and a natural in the next are not
independent events, because the cards already dealt change the mix of remaining cards and therefore change the probabilities.
That’s also an example of sampling with replacement (poker) and sampling without replacement (casino blackjack).
Samples drawn with replacement are independent because the sample space is reset to its initial condition between draws. Samples drawn
without replacement are usually dependent because what you draw out changes the mix of what is left. However, if you’re drawing from a very
large group, the change to the proportions in the mix is very small, so you can treat small samples from a very large group as independent.
Independent events are not disjoint, and disjoint events are not independent. If two events A and B are disjoint, then if A happens B can’t happen,
so its probability is zero. One of two disjoint events happening changes the probability of the other, so they can’t be independent.
Rule: For independent events, P(A and B) = P(A) × P(B)

Example 26: In Monopoly, you get an extra roll if you roll doubles, but if you roll doubles three times in a row you have to go to jail. What’s the
probability you’ll have to go to jail on any given turn?
Solution: Refer to the picture of the dice (reprinted on the last page). There are six ways out of 36 to get doubles, so P(doubles) = 6/36 or 1/6. Each
roll is independent, so the probability of doubles three times in a row is (1/6)×(1/6)×(1/6) or (1/6)^3 = 1/216, about 0.0046. If you play a lot of Monopoly,
you’ll go to jail, because of doubles, between four and five times per thousand turns.
Example 27: The first traffic light on your morning commute is red 40% of the time, yellow 5%, and green 55%. What’s the probability you’ll hit a
green all five mornings in any given week?
Solution: Are the five days independent? Yes, because where you hit that light in its cycle on one morning doesn’t influence where you hit it on
the next day. The probability of green is 55% each day regardless of what happens on any other day. Therefore, the probability of five greens on five
successive mornings is 55%×55%×55%×55%×55% or (0.55)5 ≈ 0.0503. About one week in twenty, that light should be green for you all five mornings.
Example 28: Refer again to the table of marital status (reprinted on the last page). What’s the probability that a randomly selected person is female
and widowed?
Solution: In a two-way table, for probability “and”, you don’t worry about formulas or independence because everything is already laid out
for you. 11.3 million persons are female and widowed, out of 219.7 million. Therefore:
P(female and widowed) = 11.3/219.7 ≈ 0.0514.
Example 29: Earlier in this section, I said that samples drawn without replacement are usually dependent, but you can treat them as independent
when drawing a small sample from a very large group. Here’s an example. If you randomly select three women, what’s the probability that all three
are widowed?
Solution: From the preceding example, the probability that any one woman is widowed was 11.3/219.7. Because three women is a small sample
against the millions of women in the census, and the sample is random, you can treat them as independent. If you randomly select one woman out of
millions, the mix of marital status in the remaining women is so nearly unchanged that you can ignore the difference. Therefore, the probability that
all three women are widowed is
(11.3/219.7) × (11.3/219.7) × (11.3/219.7) = (11.3/219.7)³ ≈ 0.0001.
5B5. Probability “at least” for Independent Events
There’s no special rule for “at least”, but textbook writers (and quiz writers) love this type of problem, so it’s worth looking at. “At least” problems
usually want you to combine several of the probability rules.
Example 30: Think back to that traffic light that’s green 55% of the time, yellow 5%, red 40%. What’s the probability that you’ll catch it red at least one
morning in a five-day week?
Solution: You could find the probability of catching it red one morning (five separate probabilities for five separate mornings), or two mornings
(ten different ways to hit two mornings out of five), or three, four, or five mornings. This would be incredibly laborious. Remember that the
complement is your friend. What’s the complement of “at least one morning”? It’s “no mornings”. So you can find the probability of ge ing a red on
no mornings, subtract that from 1, and have the desired probability of hi ing red on at least one morning.
P(at least one red in five) = 1 − P(no red in five)
But the status of the light on each morning is independent of all the others, so
P(no red in five) = P(no red on one)5
What’s the probability of no red on any one morning? It’s 1 minus the probability of red on any one morning:
P(no red on one) = 1 − P(red on one) = 1−0.4
Now put all the pieces together:
P(no red on one) = 1 − P(red on one) = 1−0.4
P(no red in five) = [ P(no red on one) ]5 = (1−0.4)5
P(at least one red in five) = 1 − P(no red in five) = 1 − (1−0.4)5 ≈ 0.9222
About 92% of weeks, you hit red at least one morning in the week.
Be careful with your logic! You really do need to work things through step by step, and write down your steps. Some students just seem to subtract
things from 1, and multiply other things, and hope for the best. That’s not a very productive approach.
One thing that can help you with these “at least’ and “at most” problems is to write down all the possibilities and then cross out the ones that
don’t apply, or underline the ones that do apply. For “at least one red in five”, you have 0 1 2 3 4 5 or 0 | 1 2 3 4 5. Either way, with this enumeration
technique, taught to me by Benjamin Kirk, you can see that the complement of “at least one” is “none”.
A common mistake is computing 1−0.45 for P(none), instead of the correct (1−0.4)5. “None are red” means “all are not-red”, every one of the five
is something other than red. Remember that all are not is different from not all are. In ordinary English, people often say “All my friends can’t go to
the concert” when they really mean “Some of my friends can go, but not all of them can go.” In math you have to be careful about the distinction.
Here’s an example.
Example 31: For the same situation, what’s the probability that you’ll hit a red light no more than four mornings in a five-day week? (This could also
be asked as “at most four mornings” or “four mornings at most”.)
Solution: Try enumerating. “At most four out of five” looks like this: 0 1 2 3 4 5 or 0 1 2 3 4 | 5. The previous example was a “none are” or “all are
not”, but this one is a “not all are”.
P(≤ 4 out of 5) = 1 − P(5 out of 5)
P(5 out of 5) = 0.45
P(≤ 4 out of 5) = 1 − 0.45 ≈ 0.9898
About 99% of weeks, you hit the red light no more than four mornings of the week.
Example 32: You’re throwing a barbecue, and you want to start the grill at 2 PM. Fred and Joe live on opposite sides of town, and they’ve both agreed
to bring the charcoal. The problem is that they’re both slackers. Fred is late 40% of the time, and Joe is late 30% of the time. What’s the probability
you’ll start the grill by 2 PM?
Solution: This is another “at least” problem for independent events, though this time the independent events don’t have the same probability. To
have charcoal by 2 PM, at least one of them has to show up by then. What’s the probability that at least one will be on time? Again, you could
compute the probability that they’re both on time, that Fred’s on time but Joe’s late, and that Fred’s late and Joe’s on time — all of those together will
be the probability of charcoal on time. But again, the complement is your friend. The complement of “charcoal on time” is “charcoal late”, which
happens only if they’re both late.
P(charcoal on time) = 1 − P(charcoal late)
P(charcoal on time) = 1 − P(Fred late and Joe late)
(Fred and Joe live on opposite sides of town, so whether one is late has no connection with whether the other one is late. The events are independent.)
P(charcoal on time) = 1 − P(Fred late) × P(Joe late)
P(charcoal on time) = 1 − 0.4×0.3 = 0.88
You’ve got an 88% chance of starting the grill on schedule.
Example 33: The space shu le Challenger exploded shortly after launch in the 1980s, when one of six gaskets failed. After the fact, engineers realized
that they should have known the design was too risky, but they didn’t think past “each gasket is 97% reliable.” The trouble was that if any gasket
failed, the shu le would explode. If you were asked to evaluate the design while the plans were still on the drawing board, what would you
conclude? (Note: The design makes the six gaskets independent.)
Solution: The shu le will explode if one or more gaskets fail. Here’s another “at least” problem, so enumerate the case you’re interested in:
0 | 1 2 3 4 5 6.
P(explosion) = P(at least one gasket fails)
The complement of “at least one gasket fails” (hard to compute) is “no gaskets fail” (much easier). What does it mean for no gaskets to fail? All
gaskets must hold. Since the gaskets are independent, that’s easy to compute:
P(all six gaskets hold) = 0.976
The answer you want is the complement of the all-hold or zero-fail case:
P(at least one gasket fails) = 1 − P(all six hold) = 1 − 0.976
P(explosion) = P(at least one gasket fails) = 1 −0.976 ≈ 0.1670
Conclusion: There’s about a 17% chance that the shu le will explode, just considering the gaskets and ignoring all other possible causes of trouble.
This is about the same as the odds of shooting yourself in Russian roule e.
5B6. Conditional Probability
In 2012, the Honda Accord was the most frequently stolen vehicle in the US (Siu 2013 [see “Sources Used” at end of book]). Does that mean that your
Honda Accord is more likely to be stolen than another model?
You’re tested for a rare strain of flu, and the result is positive. Your doctor tells you the test is 99% accurate. Does that mean that there’s a 99%
chance you have that strain of flu?
In New York City, a rape victim identifies physical characteristics that match only 0.0001% of people. Police find someone with those
characteristics and arrest him. Is there only a 0.0001% chance that he’s innocent?
These are examples of conditional probability — the probability of one event under the condition that another event happened. It’s probably the
most misunderstood probability topic, but I’m going to demystify it for you.
The definition may seem hard at first. But after you work through the examples you’ll find it makes sense.
De inition: The conditional probability of B given A, written P(B | A), is the probability of B under the condition that A occurs. Read B | A as “B given A” or
“if A then B”.
That’s the “probability of one” interpretation. You might find the “proportion of all” interpretation easier: P(B | A) is the
proportion of A’s that are also B.
Either way, the order ma ers — P(B | A) and P(A | B) mean different things and they’re different numbers.
Example 34: P(truck | Ford) is the probability that a vehicle is a truck if it’s a Ford, or the probability that a Ford is a truck, or the proportion of trucks
among Fords. P(Ford | truck) is the probability that a vehicle is a Ford if it’s a truck, or the probability that a truck is a Ford, or the proportion of
Fords among trucks.
Example 35: Let’s look first at the suspected rapist. The prosecutor presents evidence that these physical characteristics are found in only 0.0001% of
people. The prosecutor therefore claims that there’s only a 0.0001% chance the suspect is innocent.
But the defense points out that there are over 8 million people in New York City. 0.0001% × 8,000,000 = 8, so the suspect is not a unique
individual at all, but one of about eight people who match the eyewitness accounts. Seven of them are innocent. If there’s no evidence beyond the
physical match to tie him to the crime, the probability that this defendant is innocent isn’t 0.0001%, it’s 7/8 or 87.5%. (And that’s just in the city. If you
consider the metro area, or the US, or the world, there are even more people who match, so any one of them is even more likely to be innocent.)
The prosecutor’s fallacy is the false idea that the probability of a random match equals the probability of innocence. You can also describe this fallacy
as “consider[ing] the unlikelihood of an event, while neglecting to consider the number of opportunities for that event to occur”, in the words of “The
Prosecutor’s Fallacy” on the Poker Sleuth site (Stu bach 2011 [see “Sources Used” at end of book]).
It’s an easy mistake to make if you just think about low probabilities. To not make this error, think in whole numbers, as the defense did. 0.0001%
is hard to think about; 8 is much easier.
The key to solving conditional-probability problems is your old friend, probability of one equals proportion of all. The probability that this
particular matching person is innocent is the same as the proportion of all matching people that are innocent, or the proportion of innocent people
among those who match. Probability problems usually get easier when you turn them into problems about numbers of people or numbers of things.
What does this look like in symbols? (Don’t be afraid of symbols! They are your friend, I promise. Words are slippery and confusing, but when you
reduce a problem to symbols you make the situation clear and you are half way to solving it.)
In this example, there’s a 0.0001% chance that a random person would match the physical type of the criminal:
P(matching) = 0.0001%
The prosecution wants you to believe that the probability of a matching individual being innocent is the same:
P(innocent | matching) = 0.0001% (WRONG)
This is a conditional probability, the probability that one thing is true if another thing is true. Formally, the whole expression is “the probability of
innocent given matching”. But it’s easier to think of as “the probability that a person who matches is innocent” or “the proportion of matching people
who are innocent”.
The symbols help you clarify your thinking. “The probability of a match” and “the probability of innocence among those who match” are
different symbols, and they’re different concepts. You’d expect them to be different probabilities.
The defense showed the right way to figure the probability of innocence given a match. 0.0001%×8,000,000 = 8 people match, and 7 of them are
innocent. The probability that a matching person is innocent — the probability that a person is innocent given that he matches — is 87.5%.
P(innocent | matching) = 87.5% (CORRECT)
Notice what happens with if-then probabilities. You’re considering one group within a subgroup of the population, not one group within the whole
population. You’ve reduced your sample space — not all people, but all matching people. The bo om number of your fraction comes from the
“given that” part of the conditional probability, because P(innocent | matching) is the proportion of matching people that are also innocent.
To explode the prosecutor’s fallacy, you distinguish between a probability in the whole population and a probability in a subgroup. You also have to
ask yourself, “which group?” The issue of medical test results is a good example.
Example 36: There’s a rare skin disease, Texter’s Peril (TP), where you become hypersensitive to the bu ons on your phone. (Yes, I am making this
up.) It affects 0.03% of adults aged 18–30, three in ten thousand. The only cure is to lay off texting for 30 days, no exceptions. Naturally this is about
the worst thing that can happen to anyone.
Your doctor has tested you and the test comes up positive. She tells you that the test is 99% accurate. Does that mean you are 99% likely to have
TP? You might think so, and sadly many doctors make the same mistake.
You have a positive test result, and you want to know how likely it is that you have Texter’s Peril. In symbols,
P(disease | positive) = ?
Your doctor told you that the test is 99% accurate, meaning that 99% of people who actually have TP get a positive result:
P(positive | disease) = 99%
These are obviously not the same symbol, so the probability you care about, the probability you have the disease, may well be different from 99%.
How can you compute it?
Change those probabilities to whole numbers, and make a table. (I got this technique from the book Calculated Risks [Gigerenzer 2002 [see “Sources
Used” at end of book]]. The book cites a study showing that doctors routinely confused probabilities when counseling patients about test results.)
You’ve already played with a two-way table; now you’re going to make one. It’s a li le bit like filling in a puzzle. I hope you like puzzles. ☺
You don’t know the population size, but that’s okay. Just use a large round number, like a million. Start with what you know.
P(disease) = 0.03%
Out of 1,000,000 people, 0.03% = 300 will have TP, and the other 999,700 won’t. That’s the bo om row of the table, the totals row.
P(positive | disease) = 99%
Of the 300 who have actually have TP, 99% = 297 will get a correct positive result, and 3 will get a false negative. That’s the first column of the table.
P(negative | diseaseC) = 99%
(In the real world, a given test may not be equally accurate for positives and negatives, but we’ll overlook that to keep things simple.) Out of 999,700
who don’t have TP, 99% = 989,703 will get a correct negative result, and 9,997 will get a false positive. This is the second column of the table, and now
you can fill in the column of totals.
Have TP Don’t Have TP Total
Positive Test 297 9,997 10,294
Negative Test 3 989,703 989,706
Total 300 999,700 1,000,000
Take a look at that table, specifically the “Positive Test” row. Do you see the problem? Most of the people with positive test results actually don’t have
Texter’s Peril, even though the test is 99% accurate!
It took a while to get here, but it’s be er to be correct slowly than to be wrong quickly. You can now compute the probability of having TP
given that you have a positive test result. Once again, probability of one equals proportion of all, so this is really the same as the proportion of people
with positive test results who actually have TP:
P(disease | positive) = 297 / 10,294 = 2.89%
The test is 99% accurate, but because TP is rare, most of the positive results are false positives, and there’s under a 3% chance that a positive result
means you actually have Texter’s Peril. There’s a 1 − 297/10,294 = 97.11% chance that a positive result is a false positive.
Notice again: With conditional probability, you’re not concerned with the whole population. Rather, you focus on a subgroup within a
subgroup. P(disease | positive) is the proportion of people who actually have the disease, within the subgroup that received a positive test result.
Example 37: What’s the chance that a negative is a false negative, that given a negative test result you actually have TP? In symbols,
P(disease | negative) = ?
You’ve already got the table, so this is a piece of cake. Out of a million people, 989,706 test negative and 3 of them have the disease. The probability
that a negative is a false negative is
P(disease | negative) = 3/989,706 ≈ 0.000 003
which is essentially nil.
Example 38: A lot of Web sites in 2013 trumpeted the news that the Honda Accord was the most frequently stolen model in the US the year before.
And that’s true. Out of 721,053 stolen cars and light trucks in 2012, Hot Wheels 2012 tells us that 58,596 were Honda Accords (NICB 2013 [see “Sources
Used” at end of book]).
But many Web sites warned Honda owners that they were most at risk. For instance, Honda Accord, Civic Remain Top Targets for Thieves at
cars.com (Schmi 2013 [see “Sources Used” at end of book]) leads with “If you own a Honda Accord or Civic, or a full-size Ford pickup truck, you
might want to take a moment to make sure your auto-insurance payments are up to date. You drive one of the top three most-stolen vehicles in the
US.”
Do you see what’s wrong here? Think about it for a minute before reading on.
Yes, a lot of Honda Accords were stolen, because there are a lot of them on the road. Too many news organizations are sloppy and think that the
likelihood a stolen car is an Accord is the same as the likelihood that an Accord will be stolen. This is the doctor’s mistake from the previous example,
all over again.
Let’s clarify. You have 58,596 Accords out of 721,053 thefts, so the probability that a stolen car was an Accord — the probability that a car was an
Accord given that it was stolen — the probability of “if stolen then Accord” — is
P(Accord | stolen) = 58,596/721,053 = 8.13%
But that doesn’t tell you doodley-squat about your chance of having your Accord stolen. That would be the probability of a car being stolen given that
it is an Accord, “if Accord then stolen”. The top number of that fraction is still 58,596, but the bo om number is the total number of Accords on the
road:
P(stolen | Accord) = 58,596/(total Accords on the road in 2012)
Do you see the difference? They’re both conditional probabilities, but they’re different conditions. “If stolen then Accord” is different from “if Accord
then stolen”. The first one is about Accord thefts as a proportion of all thefts, and the second one is about Accord thefts as a proportion of all Accords.
Those are different numbers.
To find the chance that an Accord will be stolen, you need the number of Accords on the road in 2012. A press release from Experian (2012) says
there were “more than 245 million vehicles on US roads” in 2012, and 2.6% of them were Accords.
P(stolen | Accord) = (stolen Accords)/(total Accords on the road in 2012)
P(stolen | Accord) = 58,596/(2.6% of 245 million)
P(stolen | Accord) = 58,596/6,370,000
P(stolen | Accord) = 0.92%
Yes, over 8% of cars stolen in 2012 were Accords, but the chance of a given Accord being stolen was under 1%. P(Accord | stolen) = 8.13%, but
P(stolen | Accord) = 0.92%.
Optional: Conditional Probability Formula
Rule: P(B | A) = P(A and B) / P(A) —or— N(A and B) / N(A)

The “N” alternatives remind you that often it’s easier just to count than to find probabilities and then divide. Either way, when you consider P(B | A),
remember that you’re interested in the likelihood of B given that A occurs. It’s the B cases within the A group, not all the B cases.
P(A | B) is not the same as P(B | A). You’ll get the probability right if you remember that the second event, the “given that” event, supplies the
bo om number of the fraction.
Example 39: Find P(stolen | Accord), the chance that any one Accord will be stolen. Using the numbers from Example 38,
P(stolen | Accord) = N(Accord and stolen) / N(Accord)
P(stolen | Accord) = 58,596/6,370,000 = 0.92%
Example 40: I draw a card from the deck, and I tell you it’s red. What’s the probability that it’s a heart? If you didn’t know anything about the card,
you’d write P(heart) = ¼ because a quarter of the cards in the deck are hearts. But what is the probability given that it’s red?
P(heart | red) = P(heart and red) / P(red)
P(heart and red) is the probability of a red heart. A quarter of the cards in the deck are red hearts, so this is just ¼. P(red) is of course ½ because half
the cards in the deck are red.
P(heart | red) = (¼) / (½) = (¼) × 2 = ½
This one is probably easier to do by just counting:
P(heart | red) = N(heart and red) / N(red)
P(heart | red) = 26/52 = ½
Either way, you’re concerned with the sub-subgroup of hearts within the subgroup of red cards. P(heart | red) = ½ — half of the red cards are hearts.
Example 41: You know P(heart | red) = ½: given that a card is red, there’s a ½ probability that it’s a heart. But what is P(red | heart), the probability
that a card is red given that it’s a heart? You probably already know the answer, but let’s run the formula:
P(red | heart) = N(red and heart) / N(heart)
P(red | heart) = 13/13 = 1 (or 100%)
Conditional probabilities often come up in two-way tables.
Example 42: Again using the table of marital status (reprinted on the last page), what’s the probability that a randomly selected woman is divorced?
In other words, given that the person is a woman, what’s the probability that she’s divorced?
Solution: The problem wants P(divorced | woman), the probability that the person is divorced given that she’s a woman.
P(divorced | woman) = N(divorced and woman) / N(woman)
P(divorced | woman) = 13.1/113.5 ≈ 0.1154
Because we have “given woman” or “if woman”, the bo om number is the number of women, 113.5 million.
5B7. Optional: Checking Independence
Remember the definition of independent events? A and B are independent if the occurrence of one doesn’t change the probability of the other. Now
that you know about conditional probability, you can define independent events in terms of conditional probability:
Definition: Two events A and B are independent if and only if P(A|B) = P(A).
This makes sense. P(A) is the probability of A without considering whether B happened or not, and P(A|B) is the probability of A given that B
happened. If B’s occurrence doesn’t change the probability of A, then those two numbers will be equal.
Example 43: Referring again to the table of marital status (reprinted on the last page), show that “woman” and “widowed” are dependent (not
independent).
Solution:
P(widowed) = 13.9 / 219.7 ≈ 0.0633
P(widowed | woman) = 11.3 / 113.5 ≈ 0.0996
These numbers are different — the probability of “widowed” changes when “woman” is given, or in English the proportion of widowed women is
different from the proportion of widowed people. Therefore the events “woman” and “widowed” are not independent.
By the way, if A and B are independent then B and A are independent. So you could just as well compare P(woman) = 113.5/219.7 ≈ 0.5166 to
P(woman|widowed) = 11.3/13.9 ≈ 0.8129. Since those are different, you conclude that “woman” and “widowed” are dependent.
5B8. Optional: Probability “and” for All Events
When events are not independent, to find probability “and” you need to use a conditional probability. Remember the formula for conditional
probability: P(B | A) = P(A and B) / P(A). Multiply both sides by P(A) and you have P(A) × P(B | A) = P(A and B), or:
Rule: For all events, P(A and B) = P(A) × P(B | A)

Example 44: You draw two cards from the deck without looking. What’s the probability that they’re both diamonds?
Solution: Are these independent events? No! P(diamond1), the probability that the first card is a diamond, is 13/52 because there are 13
diamonds out of 52. But if the first card is a diamond, the probability that the second card is a diamond is different. Now there are only 12 diamonds
left in the deck, out of a total of 51 cards. So P(diamond2 | diamond1) = 12/51, which is a bit less than 13/52.
P(diamond1 and diamond2) = P(diamond1) × P(diamond2 | diamond1)
P(diamond1 and diamond2) = (13/52) × (12/51) ≈ 0.0588
A lot of probability problems can be solved without using formulas, through the technique of sequences. Here’s the procedure:
1. Write down the “winning sequences”, the sequences that lead to the desired outcome.
2. Assign probabilities to each event in each sequence, from start to end.
3. Multiply the probabilities within each sequence, and then add up the probabilities of all the sequences.
Example 45: Suppose a bag contains 6 oatmeal cookies, 4 raisin cookies, and 5 chocolate chip. You are to draw two cookies from the bag without
looking (and without replacement, which would be yucky). What is the probability that you will get two chocolate chip cookies?
Solution: To start with, notice that there are 6+4+5 = 15 cookies. There’s only one winning sequence, but this one illustrates an important point:
you have to assign each probability in its situation at that point in its sequence.
1. Sequence: CC1 and CC2
2. Probabilities: 5/15 and 4/14.
You compute the probability CC2 at this point in the sequence: it’s the probability of a second CC if the first cookie was CC. You don’t care
about the probabilities if the first cookie was anything else, because the sequence starts with a CC cookie. That means that, when you are
looking for the probability of a second CC cookie, the bag now contains only 14 cookies, and only 4 of them are CC.
3. Arithmetic: (5/15)×(4/14) ≈ 0.0952
Example 46: In the same situation, what’s the probability you’ll get one oatmeal and one raisin?
Solution: Even though you don’t care which order they come in, you have to list both orders among your willing sequences. Remember the
example of flipping two coins, or the examples with dice: to make probabilities come out right, consider possible orderings.
1. Sequences: (A) O1 and R2; (B) R1 and O2
2. Probabilities: (A) 5/15 and 4/14; (B) 4/15 and 5/14
3. Arithmetic: (5/15)×(4/14) + (4/15)×(5/14) ≈ 0.1905
Example 47: Consider the same bag of 15 cookies, but now what’s the probability you get two cookies the same?
Solution:
1. Sequences: (A) O1 and O2; (B) R1 and R2; (C) CC1 and CC2
2. Probabilities — again, the probability for the second cookie takes into account the first cookie that was drawn.
(A) 6/15 and 5/14; (B) 4/15 and 3/14; (C) 5/15 and 4/14
3. Arithmetic: (6/15)×(5/14) + (4/14)×(3/14) + (5/15)×(4/14) ≈ 0.2952
Example 48: Your teacher’s policy is to roll a six-sided die and give a quiz if a 2 or less turns up. Otherwise, she rolls again and collects homework if a
3 or less turns up. You haven’t done the homework for today and you’re not ready for a quiz. What is the probability you’ll get caught?
Solution: Though you could do this with formulas, you’ll get the same answer with less pain by following the method of sequences. The
“winning sequences” in this case are the sequences that lead to either a quiz or homework.
1. There are two sequences: (A) quiz (and stop, without deciding about homework); (B) no quiz, but homework
Notice that you start each sequence from the same starting point. Notice also that you don’t consider the possible sequence “no quiz and
no homework” because in that sequence you don’t get caught.
2. P(quiz) = 2/6 = 1/3. P(no quiz) = 1−1/3 = 2/3. P(homework if die roll) = 3/6 = 1/2.
(A) 1/3 (B) 2/3 and 1/2
3. (1/3) + (2/3)×(1/2) = (1/3)+(1/3)= 2/3
There’s a 2/3 probability of a quiz or homework.
Sequences let you think through a situation without ge ing confused about which formula may apply. Sometimes no formula applies. Here’s a
famous example.
Example 49: You’re a contestant on Let’s Make a Deal. You have to pick one of three doors, knowing that there’s a new car behind one of them and a
“zonk” (something funny but worthless) behind the other two. Let’s say you pick Door #1.
The host, who of course knows where the car is, opens Door #2 and shows you a zonk. He then asks whether you want to stick with your choice
of Door #1, or instead take what’s behind Door #3. What should you do, and why?
(I gave specific door numbers to help make this problem less abstract, but the specifics don’t ma er. What does ma er is that you pick a door at
random, and the host reveals that a door you didn’t pick is the wrong one.)
Solution: There’s really no formula for this one, because the host’s actions aren’t governed by probability. Once you realize that, it’s easy.
1. In the long run, 1/3 of contestants will choose the correct door, whichever one it is, and 2/3 will choose one of the two wrong doors. Why?
The show’s producers have to make sure that prizes are equally distributed among the three doors over the long haul. If they favored one
door over the others, people would notice and would start picking that door.
Therefore, P(right door) = 1/3 and P(wrong door) = 2/3.
2. If you chose the right door, the host opens one of the two wrong doors, but obviously you would not benefit by switching.
3. If you chose the wrong door, the host opens the other wrong door and offers you the chance to switch doors. The host has eliminated the other
wrong door, and the third door must be the winning door. You should switch.
If you chose the wrong door and switch doors, you will always win because the host has eliminated the other wrong door.
4. The probability that you chose the right door initially, and will lose if you switch, is 1/3. The probability that you chose the wrong door
initially, and will win if you switch, is 2/3.
In the long run, keeping your original choice is the winning strategy 1/3 of the time, and switching is the winning strategy 2/3 of the
time.
5. Switching doors doubles your chance of winning.
BTW: This is the famous Monty Hall Problem. Monty Hall [see “Sources Used” at end of book] developed Let’s Make a Deal and hosted the show for many years. There was
a lot of controversy (Tierney 1991 [see “Sources Used” at end of book]) about the answer. Many people who should have known be er thought that Door #1 and Door #3 were
equally likely after Door #2 was opened. But they forgot that this is not a pure probability problem. The host knows where the car is and picks a door to open based on that
knowledge, and that makes all the difference.
Theoretical/classical and empirical/experimental probability.
Two interpretations: probability of one = proportion of all.
Law of Large Numbers and Gambler’s Fallacy.
Sample space, and the importance of equally likely outcomes.
Constructing and interpreting probability models.
Know meanings of disjoint events a/k/a mutually exclusive events, complementary events, independent events.
Probability “or”:
For disjoint events, P(A or B) = P(A) + P(B)
For all events, P(A or B) = P(A) + P(B) − P(A and B)
Probability “not” for complementary events: P(not A) or P(AC) = 1 − P(A)
Probability “and”:
For independent events, P(A and B) = P(A) × P(B)
Optional: For all events, P(A and B) = P(A) × P(B | A)
Conditional probability: P(B | A) means probability of B given A, or probability of if-A-then-B, or probability of B if A is known to
have occurred, or proportion of B’s within the A’s.
Optional: P(B | A) = P(A and B) / P(A)
Solve problems with “at most” and “at least” conditions by using the complement and the other rules.
Solve probability problems involving two-way tables (also here).
Solve probability problems with sequences instead of formulas.
Study aids: Statistics Symbol Sheet

Because this textbook helps you,
please donate at
do it after all.
Problem Set 1
You toss three coins.

1 (a) How many entries do you expect in the sample space of equally likely events?
(b) Construct that sample space.
(c) Find P(2H), the probability of ge ing exactly two heads.
In 2003 a federal government survey estimated that 58.2% of US households had both a cell phone and a landline, 2.8% had only cell service, and
2 1.6% had no phone service at all.
(a) Construct a probability model for type of phone service to US households. (Hint: You’re going to have to add a fourth case.)
(b) Supposedly, polling agencies try not to call cell phones, because consumers object to paying for the calls. What proportion of US households could
be reached by a landline in 2003?
According to DiscovertheOdds.com (2014) [see “Sources Used” at end of book], the probability of being struck by lightning in a given year is
3 about 1 in 1,000,000. A blog post by Tara Parker-Pope (2007) [see “Sources Used” at end of book] says that the probability of suffering a shark
a ack in 2003 was about 1 in 4,691,000. Can you add these two numbers to find the probability of being struck by lightning or a acked by a shark in
2003 as 1/1,000,000 + 1/4,691,000? Briefly, why or why not?
P(A), the probability of event A, is 0.7. A and B are complementary events. Find (a) P(not A); (b) P(B); (c) P(A and B).
4 If any of them cannot be determined from the information given, say so.
A blog post by Tara Parker-Pope (2007) [see “Sources Used” at end of book] reported that your lifetime risk of dying of heart disease is 1/5, and
5 your lifetime risk of dying of cancer is 1/7. Can you add these two numbers to find the probability of dying of heart disease or cancer? Briefly,
why or why not?
Explain the difference between P(divorced | man) and P(man | divorced).

6
A company analyzed all 412 customer complaints that were received in January 2013. None of them were for unresolved billing disputes.
7 Therefore the probability that a randomly selected complaint from January 2013 was for an unresolved billing dispute is zero. We’re used to
interpreting a probability of zero as impossible, but obviously it is possible for a complaint to be about an unresolved billing dispute. How do you
resolve this paradox?
Need a hint? Think about the two kinds of probability from the beginning of the chapter.
You shuffle a standard 52-card deck well and deal five cards. What is the probability that the fifth card is a spade?
8
Write out the sample space for flipping two coins, and use it to answer these questions.
9 (a) If you are told that at least one of the flips came up heads, what is the probability that both are heads?
(b) If you are told that the first coin came up heads, what is the probability that both are heads?
The chance of being a victim of violent crime in a given year varies by age and sex, according to What are my chances of being a victim of violent
10 crime? [see “Sources Used” at end of book] Take 17.1 per thousand, or 1.71%, as the average.
(a) You’re waiting for a flight at the airport. You fall into conversation with a stranger, and you’re surprised to learn that both of you have been
victims of violent crime in the past year. Assuming random selection, what are the chances of that happening?
(b) Explain why you cannot use the same technique to find the probability that both members of a married couple have been victims of violent crime
in the past year.
For this problem, please use the table of marital status (reprinted on the last page).
11
(a) Find P(divorced).
(b) Give two interpretations of that probability.
(c) What type of probability is this: classical, empirical, experimental, theoretical?
(d) Find P(divorcedC) and give one interpretation.
(e) Find P(man and married).
(f) Find P(man or married). (Work this with and without the formula.)
(g) Find the probability that a randomly selected male was never married:
P(never married | male) = ?
(h) Find P(man | married), and interpret as “____% of ____ were ____.”
(i) Find P(married | man), and interpret as “____% of ____ were ____.”
In five-card draw poker, you are dealt five cards and then during the be ing you can discard some in hopes that the replacements will
12 improve your hand. You have a pat hand if the first five cards are good enough that you don’t need to discard. What’s the probability you’ll be
dealt a diamond flush (five diamonds) as a pat hand?
There are 20 M&Ms left in the dish: 5 blue, 4 orange, 3 green, 3 yellow, 3 brown, and 2 red. The yellows are your favorites. Your friend takes
13 three M&Ms without looking.
(a) What’s the chance that she leaves your favorites behind?
(b) What’s the chance that all three of her picks are red?
Tom Turkey invested in two risky startup companies, A and W. There is a 0.90 probability that company A will go bankrupt, and a 0.80
14 probability that company W will go bankrupt. Assuming the two companies have no connection, find the probabilities that (a) both will go
bankrupt; (b) one of them, but not both, will go bankrupt; (c) neither will go bankrupt.
Without looking, you take three M&Ms from a new three-pound bag. (The bag contains over a thousand M&Ms.) Use the probability model
15 of plain M&M colors (reprinted on the last page) to answer these questions.
(a) Find the probability that all three are red.
(b) Find the probability that none are red.
(c) Find the probability that at least one is green.
(d) Find the probability that exactly one is green.
A poll found that 45% of baseball fans had a ended a game in person within the past year. Of five randomly selected baseball fans, find the
16 probability that at least one fan had not a ended a game within the past year.
Without looking, Grace Underfire takes two sourballs from a bowl that contains 11 cherry and 9 orange flavor. What is the probability that she
17 will get one of each flavor?
An annual church raffle offers one chance in 500 of winning something. Find the chance that you win at least once if you play five years in a
18 row.
Butch will miss an important TV program while taking his statistics exam, so he sets both his DVRs to record it. The first one records 70% of
19 the time, and the second one records 60% of the time. (Their performance is independent.) What is the probability that he gets home after the
exam and finds
(a) No copies of his program?
(b) One copy of his program?
(c) Two copies of his program?
Problem Set 2
Police plan to enforce speed limits during the morning rush hour on four different routes into the city. The traps on routes A, B, C, and D are
20 operated 40%, 30%, 20%, and 30% of the time, respectively. Biff always speeds to work, and he has probability 0.2, 0.1, 0.5, and 0.2 of using
those routes.
(a) What’s the probability that he’ll get a ticket on any one morning?
(b) What’s the probability he’ll go five mornings without a ticket?
(Hint: His choice of a route, and whether there’s a speed trap on that route, are independent.)
For this problem, please use the table of marital status (reprinted on the last page). Show that the events “man” and ”divorced” are not
21 independent.
I remarked that if you flip a fair coin repeatedly, you’ll see a run of ten heads or ten tails. Show why this should happen twice in about every
22 thousand flips.
(adapted from Dabes and Janik 1999 [see “Sources Used” at end of book], page 24) The probability that a certain door is locked is 0.5. The key
23 to the door is one of five unidentified keys hanging on a rack. You select two keys before going to the door. Find the probability that you can
open the door without returning for another key.
US Marital Status in 2006 (in Millions) Colors of Plain M&Ms Seat-Belt Use by
Men Women Totals Blue 24 % College Students Driving
Married 63.6 64.1 127.7 Orange 20 % (sample size: 4521)
Widowed 2.6 11.3 13.9 Green 16 % Never 2.61 %
Divorced 9.7 13.1 22.8 Yellow 14 % Rarely 5.51 %
Never married 30.3 25.0 55.3 Brown 13 % Sometimes 7.63 %
Totals 106.2 113.5 219.7 Red 13 % Most of 15.84 %
the time
Always 68.41 %
6. Discrete Probability Models

Updated 18 Jan 2017
Intro: In Chapter 5, you looked at the probabilities of specific events. In this chapter, you’ll take a more global view and look at the
probabilities of all possible outcomes of a given trial.
Contents: 6A. Random Variables

6B1. Mean and Standard Deviation of a DPD
6B2. Comparing DPDs: Parking Choices
6B3. Fair Price of a Game
6D1. Computing Probabilities
6D2. Mean and Standard Deviation of a Geometric Distribution
6D3. Making a Decision
6D4. Baseball
6E1. Computing Probabilities
6E2. Baseball Again!
6E3. Mean and Standard Deviation of a Binomial Distribution
6E4. Surprised?
6E5. A Life-or-Death Example
6A. Random Variables
The random variable is one of the main concepts of statistics, and we’ll be dealing with random variables from now till the end of the course.
De initions: A variable is “the characteristic measured or observed when an experiment is carried out or an observation is made.”
—Upton and Cook (2008, 401) [see “Sources Used” at end of book]
If the results of that procedure depend on chance, completely or partly, you have a random variable. Each outcome of the procedure is
a value of the variable. We use a capital le er like X for a variable, and a lower-case le er like x for each value of the variable.
As you learned in Chapter 1 [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c01_Main], numeric variables can be discrete or

continuous. A discrete random variable can have only specific values, typically whole numbers. A continuous random variable can
have infinitely many values, either across all the real numbers or within some interval.
In this chapter, you’ll be concerned with discrete random variables. In the next chapter, you’ll look at one particular type of continuous random
variable, the normal distribution.
Example 1: You roll three dice. The number of sixes that appear is a random variable, and the total number of spots on the upper faces is another
random variable. These are both discrete.
Example 2: You randomly select a household and ask the family income for last year. This is a continuous random variable.
Example 3: You randomly select twelve TC3 students, measure their heights, and take the average. “Height of a student” is a continuous random
variable, and “average height in a 12-student sample” is another continuous random variable.
Example 4: You randomly select 40 families and ask the number of children in each. “Number of children in family” is a discrete random variable,
and “average number of children in a sample of 40 families” is a continuous random variable.
Definition: A discrete probability distribution or DPD (also known as a discrete probability model) lists all possible values of a discrete random
variable and gives their probabilities. The distribution can be shown in a table, a histogram, or a formula. Like any probabilities, the
probabilities in a DPD can be determined theoretically or experimentally [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c05_BasicsWhere].
Example 5: In March 2013, Royal Auto sent me one of those “Win big!” flyers with a fake car key Declared Chance of
taped to it. The various prizes, and chances of winning, are shown at right. Prize
Value, x Winning, P(x)
This is a discrete probability distribution. The discrete variable X is “prize value”, and the five
possible values of X are $100,000 down to $5. Two Camaros $100,000 1 in 5,000,000
Remember the two interpretations of probability: probability of one = proportion of all. From
Cash 10,000 1 in 1,000,000
the table, you can equally well say that any person’s chance of winning a $500 prize is 1/250,000 =
0.000 004 = 0.0004%, or that in the long run 0.0004% of all the people who participate in the promotion Apple iPad 1,000 1 in 500,000
will win a $500 prize.
Various 500 1 in 250,000
A discrete probability distribution must list all possible outcomes. The total probability for all possible
Gift card 5 0.9999928
outcomes in any situation is 1. Therefore, for any discrete probability distribution, the probabilities
must add up to 1 or 100%.
6B1. Mean and Standard Deviation of a DPD
Definitions: Suppose you do a probability experiment a lot of times. (For the Royal Auto example, suppose bazillions of people show up to claim
prizes.) Each outcome will be a discrete value. The mean of the discrete probability distribution, µ, is the mean of the outcomes from
an indefinitely large number of trials, and the standard deviation of the discrete probability distribution, σ, is the standard deviation
of the outcomes from an indefinitely large number of trials. The mean of any probability distribution is also called the expected value,
because it’s the expected average outcome in the long run.
How do you find the mean and SD of a discrete probability distribution? Well, one interpretation of probability is long-term relative frequency [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c05_BasicsWhat], so you can treat a discrete probability distribution as a relative frequency distribution.
(You can also think of the probabilities as weights, with the mean as the weighted average.) On the TI-83/84, that means good old 1-Var Stats, just
like in Chapter 3.
BTW: Textbooks all list the formulas, so if you want to know them here they are. But in fact everybody uses software except in the simplest cases.
µ = ∑ x·P(x) σ = √[ ∑ (x²·P(x)) − µ²]
For ∑, see ∑ Means Add ’em Up in Chapter 1.
Example 6: To find the mean and SD of the distribution of winnings in the Royal Auto sweepstakes, put the x’s in one list and the P(x)’s in another
list. Caution: When the probability is a fraction, enter the fraction, not an approximate decimal. The calculator will display an approximate decimal,
but it will do its calculations on a much more precise value.
After entering the x’s and p’s, press [STAT] [►] [1] and specify your two lists, such as 1-Var Stats L1,L2. (Yes, the order ma ers: the x list must
be first and the P(x) list second.) When you get your results, check n first. In a discrete probability distribution, n represents the total of the
probabilities, so it must be exactly 1. If it’s just approximately 1, you made a mistake in entering your probabilities.
The mean of the distribution is µ = $5.03 , and the standard deviation is σ = $45.85 .
Interpretation: In the long run, the dealership will have to pay out $5.03 per person in prizes. The SD is a li le harder to get a grasp on, but
notice that it’s more than nine times the mean. This tells you that there is a lot of variability in outcome from one person to the next. In general, the
mean tells you the long-term average outcome, and the SD tells you the unpredictability of any particular trial. You can look at the SD as a
measure of risk.
A couple of notes about the calculator output: The calculator knows that a DPD is a population, so it gives you σ and not s for the SD. It should
give you µ for the mean, but instead it displays x̅, so you need to make the change. I’ve already mentioned that the sum of the probabilities (n) must
be exactly 1, not just approximately 1.
6B2. Comparing DPDs: Parking Choices
Example 7: When visiting the city, should you park in a lot or on the street? On a quarter of your visits (25%), you park for an hour or less, which
costs $10 in a lot; for parking more than an hour they charge a flat $14. If you park on the street, you might receive a simple $30 parking ticket
(p = 20%), or a $100 citation for obstruction of traffic (p = 5%), but of course you might get neither. Which should you do?
(Adapted from Paulos 2004 [see “Sources Used” at end of book].)
You have two probability models here, one for the outcomes of parking in a lot, and one for street parking. Begin by pu ing the two models into
tables:
Parking in lot x P(x) Parking on street x P(x)

≤ 1 hour $10 0.25 Parking ticket $30 0.20
> 1 hour $14 Obstruction ticket $100 0.05
No ticket
The problem leaves out some things that you can figure for yourself. Remember that every probability model includes all outcomes, and the
probabilities add up to 1. If there’s a 25% chance of parking up to an hour, there must be a 100−25 = 75% chance of parking more than an hour. And
on the street, if you have a 20+5 = 25% chance of ge ing some kind of ticket, you have a 100−25 = 75% chance of ge ing neither. The cost of ge ing
neither ticket is zero.
Now you can fill in the empty cells in the tables.
Parking in lot x P(x) Parking on street x P(x)

≤ 1 hour $10 0.25 Parking ticket $30 0.20
> 1 hour $14 0.75 Obstruction ticket $100 0.05
Total 1.00 No ticket $0 0.75
Total 1.00
BTW: I showed the total probability to emphasize that it’s 1. Never compute the total of the outcomes (x’s), because that wouldn’t mean anything.
How do these tables help you make up your mind where to park? By themselves, they don’t. But they let you compute µ and σ, and that will help
you decide.
I placed the x’s and P(x)’s for the parking lot in L1 and L2, and did 1-Var Stats L1,L2. I placed the x’s and P(x)’s for street parking in L3 and
L4 and did 1-Var Stats L3,L4. Here are the results:
Lot: Street:
As always, look first at n. If it’s not exactly 1, find your mistake in entering the probabilities.
Now you can interpret these results. Parking in the lot is a bit more expensive in the long run (µ = $13.00 per day versus µ = $11.00 per day). But
there are no nasty surprises (σ = $1.73, li le variation from day to day). Parking on the street is much riskier (σ = $23.64), meaning that what happens
today can be wildly different from what happened yesterday.
So what should you do? Statistics can give you information, but part of your decision is always your own temperament. If you like stability and
predictability — if you are risk averse — you’ll opt for the parking lot. If it’s more important to you to save $2 a day on average, and you can accept
occasionally ge ing hit with a nasty fine, you’ll choose to park on the street.
6B3. Fair Price of a Game
De initions: The fair price of a game is the price that would make all parties come out even in the long run. (We’re not just talking traditional games here. A
game is any activity where the participants stand to gain or lose money or something else of value. Usually chance contributes to the outcome,
but not necessarily.)
The fair price of a game is the price that would make the expected value or mean value of the probability distribution equal to
zero, the break-even point.
(“Fair price” is one of those math words that look like English but mean something different. You should expect to pay more than the fair price
because the operator of the game — the insurance company or casino or stockbroker — also has to cover selling and administrative expenses.)
There are two ways to compute the fair price:

Method 1: Ignore the actual price of the game, multiply each prize by its probability, and add up the products.
Method 2: If you already know the mean of the probability distribution from the player’s point of view, then fair price = actual price + µ.
Example 8: Take a really simple bar game: a stranger offers to pay you $60 if you roll a 6 with a standard six-
sided die, but you have to pay him $12 per roll. Find the fair price of this game. Die shows x P(x)
Method 1: The only prize is $60, and you have a 1/6 chance of winning it. $60×(1/6) = $10 . 1,2,3,4,5 −$12 5/6
Method 2: Amounts in L1, probabilities in L2; 1-VarStats L1,L2. Verify that n=1, and read off the mean of −
6 $60−12 = $48 1/6
$2. The actual price is $12, so the fair price is $12 + (−2) = $10 .
6/6 = 1
Naturally, the two methods always give the same answer. Method 2 is easier if you already know the mean of
the probability distribution; otherwise Method 1 is easier.
Example 9: A lo ery has a $6,000,000 grand prize with probability of winning 1 in 3,000,000. It also has a $10 consolation prize with probability of
winning 1 in 1000. What is the fair price of your $5 lo ery ticket?
Solution: You don’t need µ, so Method 1 is easier: multiply each prize by its probability and add up the products. $6,000,000×(1/3,000,000) +
$10×(1/1000) → fair price is $2.01 .
Why does a lo ery ticket that is worth $2.01 actually cost $5.00? In effect, the lo ery is paying out about 2.01/5.00 ≈ 40% of ticket sales in prizes.
Some of the 60% that the lo ery commission keeps will cover the lo ery’s own expenses, and the rest is paid to the state treasury. This is actually
fairly typical: most lo eries pay out in prizes less than half of what they take in. By contrast, the illegal “numbers game” pays out about 70%, or at
least it did in the 1980s in Cleveland. (Don’t ask me how I know that!)
In the examples so far of probability models, I’ve had to give you a table of probabilities. But there are many subtypes of discrete probability
distribution where the probabilities can be calculated by a formula. The rest of this chapter will look at part of one family, discrete probability
distributions that come from Bernoulli trials.
De inition: Repeated trials of a process or an event are called Bernoulli trials if they have both of these characteristics:
1. Each trial has only two possible outcomes. We call those “success” and “failure”. However, “success” is not necessarily a
desirable outcome. Success simply means the outcome you’re interested in, and failure is the other outcome.
2. The probability of success, denoted p, is the same for every trial. This is another way of saying that the trials are independent.
(Even if they’re not independent, you can usually treat the trials as independent if the sample is a small part of the
population, not more than about 10%.)
If the probability of success on each trial is p, then the probability of failure on each trial is 1−p, or q for short.
BTW: Bernoulli trials are named after Jacob Bernoulli [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Bernoulli], a Swiss mathematician. He developed the binomial
distribution, which you’ll meet later in this chapter.
Example 10: You randomly interview 30 people to find out which party they will vote for in the next election. These are not Bernoulli trials, because
there are more than two possible outcomes. (New York State ballots often have six or more parties listed, though some parties just
endorse the Republican or Democratic candidate.)
Example 11: On reflection, you realize that you don’t care which party a given voter will choose. All you care about is whether they are voting for
your candidate or not, so you randomly select 30 registered voters and ask, “Will you be voting for Abe Snake for President?” (Yes,
that’s a real thing; here’s a video [URL h ps://www.youtube.com/watch?v=hB85DNp4dQY accessed 2017-01-18].) These are Bernoulli
trials, because there are only two answers, and the probability of voting for Abe Snake is the same for each randomly selected person.
(p equals the proportion of Abe Snake voters in the population. Remember, proportion of all = probability of one.)
BTW: Actually, this overlooks the undecided or “swing” voters. These become fewer as the election gets closer, but in real life they can’t be overlooked because
they may be a larger proportion than the leading candidate’s lead.
Example 12: You draw cards from a deck until you get a heart. These are not Bernoulli trials. Although there are only two outcomes, heart and
other suit, the probability changes with each draw because you have removed a card from the deck.
Variation: You replace each card and reshuffle the deck before drawing the next card. Then these become Bernoulli trials because the
probability of drawing a heart is 25% on every trial.
Variation: You have five decks shuffled together, instead of one 52-card pack. You don’t replace cards after drawing them. You can
treat these as Bernoulli trials even without replacement, because you won’t be drawing enough cards to alter the probabilities
significantly.
How do I know? Five packs is 260 cards, and 10% of 260 is 26. On the first card, P(heart) = 25%. It’s quite unlikely that you’d have
no hearts by the 26th card (0.04% chance), but if you did, the probability of a heart on the 27th card would be: 5×13/(5×52−26) ≈ 27.8%.
That’s not much different from the original 25%.
(You don’t have to take my word for these probabilities. Use the sequences method from Chapter 5 to compute them.)
Although this sample without replacement violates independence, it doesn’t violate it by very much, not enough to worry about.
This bears out what I said earlier: Trials without replacement can still be treated as independent when the sample is small relative to
the population.
Example 13: According to the AVMA (2014) [see “Sources Used” at end of book] 30.4% of US households own one or more cats. Suppose you
randomly select some households.
(a) How likely is it that the first time you find cat owners is in the fifth household?
(b) How likely is it that your first cat-owning household will be somewhere in the first five you survey?
Although you could compute these individual probabilities using techniques from Chapter 5, there’s a specific model called the geometric model that
makes it a lot easier to compute. Also, using the geometric model you can get an overview of the probabilities for various outcomes, which you’d
miss by computing probabilities of specific events using the previous chapter’s techniques. If trials are independent, and you want the probability of
a string of failures before your first success, you’re using a geometric model.
De inition: The geometric model, also known as the geometric probability distribution, is a kind of discrete probability distribution that applies
to Bernoulli trials when you try, and try, and try again until you get a success. P(x) is the probability that your first success will come
on your xth a empt, after x−1 failures.
Expanding on the definition of Bernoulli trials, you can say that a geometric model is one where
Each trial has only two possible outcomes, called success and failure.
There’s no fixed number of trials. You keep on till you have a success.
The probability of success, p, is the same on every trial (the trials are independent).
The random variable X is the number of trials, including the successful final trial, so x ≥ 1, with no upper limit.
The probability of success on any given trial, p, completely describes a geometric model.
Here’s a picture of part of the geometric model for cat-owning households, with
p = 0.304.
How do you read this? The horizontal axis is x, the number of the trial that gives your
first success, and the vertical axis is P(x), the probability of that outcome.
For example, there’s a hair over a 30% chance that you’ll find cat owners in your first
household, P(1) = 30.4%. There’s about a 21% chance that the first household won’t own
cats but the second household will, P(2) ≈ 21%. Skipping a bit to x = 6, there’s just about a
5% chance that the first five households won’t have cats but the sixth will, P(6) ≈ 5%. And
so forth.
x = 1 is always the most likely outcome, and larger x values are successively less
and less likely. This is true for every geometric distribution, not just this particular one
with p = 0.304.
The geometric model never actually ends. The probabilities eventually get too small
to show in the picture, but no ma er what x you pick, the probability is still greater than
0.
6D1. Computing Probabilities
Your TI-83/84 calculator has two menu selections for the geometric model:
geometpdf(p,x) answers the question “what’s the probability that my first success will come at trial number x?”
geometcdf(p,x) answers the question “what’s the probability that my first success will come at or before trial number x?” (The “c” stands for
cumulative, because the cdf functions accumulate the probabilities for a range of outcomes.)
They’re both in the [2nd VARS makes DISTR] menu.
(If you have a calculator in the TI-89 family, use the [F5] Distr menu. Select Geometric Pdf and Geometric Cdf.)
Let’s use the calculator to find the answers for Example 13. Here p, the probability of success in any given household, is 30.4% or 0.304.
Part (a) wants the probability of four failures followed by a success on the fifth try. For that you use geometpdf. Press [2nd VARS makes DISTR] [▲] [▲] to get to
geometpdf, and press [ENTER].
With the “wizard” interface: With the classic interface:
Enter p and x. After entering p and x, press [)] [ENTER] to get the answer.
geometpdf(.304,5) = .0713362938 → 0.0713

Press [ENTER] twice, and your screen will look like the one at right.
There’s about a 7% chance you won’t find any cat owners in the first four households but you will in the fifth household.
4
(You could calculate this the long way. The probability of four failures followed by a success is (1−.304)4×.304. But the geometric model is easier.
That’s the point of a model: one general rule works well enough for all cases, so you don’t have to treat each situation as a special case with its own
unique methods.)
Part (b) wants the probability of a success occurring anywhere in the first five trials. This is a geometcdf problem. Press [2nd VARS makes DISTR] [▲] to get to
geometcdf, and press [ENTER].
Enter p and x. After entering p and x, press [)] [ENTER] to get the answer.
geometcdf(.304,5) = .8366774327 → 0.8367

There’s almost an 84% chance you will find at least one cat-owning household among the first five.
(Doing this the long way, you would use the complement. The complement of “at least one cat-owning household in the first five” is “no cat-
owning households in the first five”. The probability that a given household doesn’t own a cat is q = 1−.304 = 0.696, and the probability that five in a
row don’t own cats is 0.6965. Therefore the original probability you wanted is 1−(.6965) = 0.8367.)
BTW: You don’t actually need formulas for the geometric model, but if you’re curious about what your calculator is doing, here they are:
geometpdf(p,x) = qx−1p geometcdf(p,x) = 1−qx
where q = 1−p as usual. You can see that the two “long way” paragraphs above actually used those formulas.
6D2. Mean and Standard Deviation of a Geometric Distribution
The geometric distribution is completely specified by p, so you can compute the mean and standard deviation quite easily:
µ = 1/p σ = µ √q or (1/p) √(1−p)
Example 14: 30.4% of US households own cats. How many households do you expect you’ll need to visit to find a cat-owning household?
Solution: The expected value of a distribution is the mean. µ = 1/p = 1/.304 = 3.289473684. µ = 3.3. Interpretation: On average, you expect to have to
visit between 3 and 4 households to find the first cat owners.
Caution! The expected value (mean) is not the most likely value (mode). Take a look back at the histogram, and you’ll see that the most likely
value is 1: you’re more likely to get lucky on the first trial than on any other specific trial. But the distribution is highly skewed right, so the average
gets pulled toward the higher numbers.
To compute the SD, just multiply the mean by √q. A handy technique is called chaining calculations. After first calculating
the mean, press the [×] key, and the calculator knows you are multiplying the previous answer by something. Here you see
that σ = 2.7.
Interpreting σ is a bit harder. The geometric distribution is a type of discrete probability distribution, so you interpret
its standard deviation the same way as for any other DPD. In this particular example, σ is almost as large as µ, so you expect a lot of variability. If you
and a lot of co-workers go out independently looking for households with cats, the group average number of visits will be 3.3 households, but there
will be a lot of variability between different workers’ experience. You can’t use the Empirical Rule [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c03_Empirical] here because the geometric model is not a bell curve, but you can at least say you won’t
be surprised to find workers who get lucky on the first house (µ−σ ≈ 0.5), and workers who have to visit six houses or more (µ+σ ≈ 6.0).
6D3. Making a Decision
Some people find it very hard to make choices because they feel they must consider all the pros and cons of every possibility. Others look at
possibilities one at a time and take the first one that’s acceptable. Studies such as The Tyranny of Choice (Roets, Schwar , Guan 2012 [see “Sources
Used” at end of book]) show that the first group may make be er choices objectively, but the second group is happier with the items they choose.
Example 15: You have to buy a new sofa. You’d be content with 55% of the sofas out there. Let’s assume that your Web search presents sofas in an
order that has nothing to do with your preferences. There are hundreds to choose from, so you decide to adopt the “first one that’s acceptable”
strategy. How likely is it that you’d order the third sofa you’d see?
Solution: This is a geometric model, with two failures followed by one success. p = 55%. geometpdf(.55,3) = .111375. There’s about an 11% chance
you’d order the third sofa.
6D4. Baseball
Example 16: Larry’s ba ing average is .260. During which time at bat would he expect to get his first hit of the game? How likely is he to get his first
hit within his first four times at bat?
Solution: This is a geometric model with p = 0.260. The mean or expected value is 1/p = 1/.26 = 3.85, about 4. On average, his first hit each game will
come on his fourth time at bat . For the second question, geometcdf(.26,4) = .70013424; there’s about a 70% chance he’ll get his first hit within his
first four times at bat.
In the previous section, we looked at the geometric model, where you just keep trying until you get a success. In this section, we’ll look at the
binomial model, where you have a fixed number of trials and a varying number of successes.
De inition: The binomial model, also known as the binomial probability distribution or BPD, is a kind of discrete probability distribution that
applies to Bernoulli trials when you have a fixed number of trials, n.
Expanding on the definition of Bernoulli trials, you can say that a binomial model is one where
Each trial has only two possible outcomes, called success and failure.
You have a fixed number of trials, n.
The probability of success, p, is the same on every trial (the trials are independent). The probability of failure on any trial is q =
1−p.
The random variable X is the number of successes, so 0 ≤ x ≤ n, and P(x) is the probability of x successes.
Example 17: Cats again! 30.4% of US households own one or more cats. You visit five households, selected randomly.
(a) What’s the chance that no more than two have cats?
(b) What’s the chance that exactly two have cats?
(c) What’s the chance that at least two have cats?
(d) What’s the chance that two to four have cats?
This problem fits the binomial model: n = 5 trials, each household does or does not have
cats, and the probability p = 30.4% is the same for each household.
A picture of this binomial distribution is shown at right, and you can see some
differences from the picture of the geometric distribution:
The geometric distribution extends from one trial to ∞, but the binomial
distribution can have only 0 to n successes in n trials.
The binomial distribution is less strongly skewed than the geometric
distribution, and 1 is not necessarily the most likely value of X (though it
happens that way in this particular distribution).
How do you read the picture? There’s about a 17% probability that none of the five
households will have cats, about 36% that one of the five will have cats, and so on. (Why
36% and not 30.4%? Because there’s a greater chance of “winning” one out of five than
one out of one.)
BTW: In this book we’re more concerned with computing probabilities, but it can be nice to get an
overall picture of a distribution. I made this particular graph by using @RISK from Palisade Corporation, but you can also make histograms of binomial distributions by using
MATH200A Program part 1(5).
6E1. Computing Probabilities
Here you have a choice. Your TI-83/84 calculator comes with two menu selections for the binomial model, but the MATH200A program gives you a
simpler interface. Here’s a quick overview of both, before we start on computations:
With the MATH200A program (recommended): If you’re not using the program:
MATH200A Program part 3 gives you one interface for all binomial These are both in the [2nd VARS makes DISTR] menu:
probability calculations. The program might already be on your calculator
from Chapter 3 boxplots, but if it’s not, see Ge ing the Program [URL: binomcdf(n,p,x) answers the question “what’s the
h ps://BrownMath.com/ti83/math200a.htm#Download] for instructions. probability of no more than x successes in n trials (0 to x
successes)?” (The “cdf” stands for cumulative distribution
To find binomial probability with the program, press [PRGM]. If you see function, because the cdf functions accumulate the
MATH200A in the list, press its menu number; otherwise, press [▼] or [▲] to probabilities for a range of outcomes.)
get to MATH200A, and press [ENTER].
That puts the program name on your home binompdf(n,p,x) answers the question “what’s the
screen. Press [ENTER] again to run the program, probability of exactly x successes in n trials?” (The “pdf”
and yet again to dismiss the title screen. You’ll stands for probability distribution function, because the
then see a menu. Press [3] for binomial probability for any particular number of successes is a
probability. function of [determined by] that number.)
Got a TI-89 family calculator? Use the [F5] Distr menu. Select Binomial Pdf or Binomial Cdf. The
Cdf function can handle any range of successes, not just 0 to x. See Binomial Probability Distribution Because this textbook helps you,
on TI-89 [URL: h ps://BrownMath.com/ti83/binsho89.htm] for full instructions.) please donate at
Now let’s use your TI-83/84 to answer the questions in Example 17. You have five trials, so n = 5. The
probability of success on any given household is 30.4%, so p = 0.304.
(a) What’s the probability that no more than two of the five randomly selected households have cats?
Press [PRGM], select MATH200A, and press [3] in the MATH200A menu. The probability that no more than two of your five households have
cats (in other words, the probability that 0 to 2 have cats) is
Enter n and p. “No more than two cats” is binomcdf(5,.304,2). Press [2nd VARS makes DISTR] and scroll up to
from 0 to 2 cats, so enter those values when binomcdf.
prompted. The program echoes back your
inputs and shows the computed probability. If you don’t have the “wizard” interface, or
To show your work, write down the screen you have it turned off, binomcdf( will
name, the inputs, and the result. appear on your screen, Enter n, p, and the
desired maximum number of successes, in that order, then the closing
Conclusion: P(x ≤ 2) or P(0 ≤ x ≤ 2) = 0.8316 . paren and [ENTER].
If you have the “wizard” interface, you get

a menu screen, but you enter the same
information. Press [ENTER] once on Paste
and then again when the command is
pasted to your home screen.
Either way, write down the binomcdf command and the argument
numbers to show your work.
Conclusion: P(x ≤ 2) or P(0 ≤ x ≤ 2) = 0.8316 .
(b) What’s the probability that exactly two of five randomly selected households are cat owners?
You need a specific number of (a) The probability of exactly two cat-owner
successes, instead of a range. It’s almost households in five is binompdf(5,.304,2). Press
exactly the same deal: you just enter the [2nd VARS makes DISTR] and then press [▲] several
same number for from and to. In this times to get to binompdf. (Caution! pdf, not cdf.) Press [ENTER], type in the
example, to get the probability of numbers, and press [)] [ENTER].
exactly two successes, enter number of
successes from 2 to 2. (The “wizard” interface screen is the same as it was for binomcdf.)
Conclusion: P(x = 2) or P(2) = 0.3116 . Conclusion: P(x = 2) or P(2) = 0.3116 .
(c) What’s the probability that at least two of the five randomly selected households have cats?
With the MATH200A program If you’re not using the program:

(recommended):
(recommended):
“At least two”, in a sample of five, means This one is a li le trickier. You could find P(2), P(3), P(4), and P(5) and add them up by hand, but
from two to five successes. Enter those that’s tedious and error prone, and it can introduce rounding errors. Instead, you’ll make the
values in MATH200A part 3. Here’s the calculator add them up for you.
results screen:
First, get all the probabilities for 0 through n successes into a statistics list.
To do this, use binompdf (not cdf) but with only the n and p arguments. (If
you have the “wizard” interface, leave x value blank.)
After the closing paren, don’t press [ENTER] just yet. Instead, press the
[STO→] key and select a statistics list, such as [2nd 6 makes L6]. Then press
[ENTER]. This puts the probabilities for 0 successes, 1 success, and so on to 5 successes into L6. (If you
Conclusion: P(x ≥ 2) or P(2 ≤ x ≤ 5) = want, you could examine them with [►], or on the [STAT] edit screen.)
0.4800 .
Now you need to sum the desired range of cells. You want 2 ≤ x ≤ 5. But the
lowest possible x is 0, and the cells in statistics lists are numbered starting
at 1. So to get x from 2 through 5, you need cells 3 through 6. When
summing part of a list, add 1 to your desired x values.
Press [2nd STAT makes LIST] [◄] [5] to paste sum(, then [2nd 6 makes L6] [,]
3 [,] 6 [)] [ENTER].
Your answer: P(x ≥ 2) or P(2 ≤ x ≤ 5) = 0.4800 .
Beware of off-by-one errors when you solve problems with phrases like at least and no more than.
Always test the “edge conditions”. “Okay, I need at least 2, and that’s 2 through 5, not 3 through 5.
Oh yeah, add 1 for the statistics list in the TI-83, so I’m summing cells 3 through 6, not 2 through 5.”
Alternative solution: Do you remember solving “at least” problems in Chapter 5? What was the
lesson there? With laborious probability problems, the complement is your friend. What’s the
complement of “at least two”? It’s “fewer than two”, which is the same as “no more than one”.
Shaky on the logic of complements? Use the enumeration method from Chapter 5: 0 1 2 3 4 5 or
0 1 | 2 3 4 5.
Find the probability of ≤1 household with cats, and subtract from 1:
P(x ≥ 2) = 1 − P(x ≤ 1)
P(x ≥ 2) = 1 − binomcdf(5, .304, 1)
P(x ≥ 2) = .4799959639 → 0.4800
(d) What’s the chance for two to four cat-owning households in your random sample of five households?

(recommended):
Nothing new here: just use good You need x from 2 through 4, but remember you always add 1 when summing
old MATH200A part 3. Here’s binomial probabilities from a statistics list, so you put 3 to 5 in your sum command.
the results screen: (You’re still using the same distribution, so there’s no need to repeat binompdf.)
P(2 ≤ x ≤ 4) = 0.4774 .
Alternative solution: You can also do it without summing. If you think about it, the probability for x from 2 to
4 is the probability for x from 0 to 4, with x below 2 (x no more than 1) removed: 0 1 2 3 4. In symbols,
P(0 ≤ x ≤ 1) + P(2 ≤ x ≤ 4) = P(0 ≤ x ≤ 4)
and by subtracting that first term you get
P(2 ≤ x ≤ 4) = 0.4774 . P(2 ≤ x ≤ 4) = P(0 ≤ x ≤ 4) − P(0 ≤ x ≤ 1)
Your probability is the result of subtracting two cumulative probabilities, the cdf
from 0 to 4 minus the cdf from 0 to 1. It’s shown at right.
This is tricky, I admit. You have to set that x value correctly in the second
binomcdf, so this method is not much be er than the other one. About all it has
going for it is that it avoids storing values in a list and then using sum.
BTW: You don’t actually need a formula for the binomial model, but if you’re curious about what your calculator is doing, here it is:
binompdf(n,p,x) = nCx · px qn−x
Why? px is the probability of ge ing successes on all of the first x trials. q is the probability of failure on one trial, and therefore qn−x is the probability of failure on the
remaining trials, after the x out of n successes. But in a binomial probability model, you care how many successes and failures there are, not in what order they occur. To
account for the fact that order doesn’t ma er, the formula has to multiply by nCx, “the number of ways to choose x objects out of n”. (If you want to know more about nCx,
search “combinations” at your favorite math site.)
BTW: Unlike the geometric case, there’s no simple formula for binomcdf. Your calculator just has to compute probabilities for x = 0, 1, and so on and add them up.
6E2. Baseball Again!
Example 18: Larry’s ba ing average is .260. How likely is it that he’ll get more than one hit in four times at bat?
Solution: This is a binomial model with n = 4, p = 0.26, x = 2 to 4. You can use MATH200A part 3 or the binompdf-sum
technique to get .27870128. P(x > 1) = 0.2787 or about 28%. (The program is completely straightforward, so I’m showing
only the tricky binompdf-sum sequence here.)
Alternative solution: If you don’t have the program, can you see how to use the complement to solve this problem
more easily? Check your answer against mine to be sure that your method is correct.
6E3. Mean and Standard Deviation of a Binomial Distribution
The binomial distribution depends on the proportion in the population (p) and your sample size (n). You can compute the mean and SD quite easily:
µ = np σ = √[npq]
What are the mean and SD of the number of cat-owning households in a random sample of five households?
µ = np = 5 × 0.304 = 1.52
σ = √[npq] = √[5 × .304 × (1−.304)] = 1.028552381
Conclusion: µ = 1.5 and σ = 1.0 .
Interpretation: in a sample of five households, the expected number of cat-owning households is 1.5. Or, if you take a whole lot of samples of five
households, on average you will find that 1.5 households per sample own cats. The SD is 1.0. You can’t use the Empirical Rule, but you can say that
you expect most of the samples of five to contain µ±2σ = 1.5±2×1.0 = 0 to 3 cat-owning households.
6E4. Surprised?
Example 19: 30.4% of US households own one or more cats. You visit ten random households and seven of them own cats. Are you surprised at this
result?
De inition: A result is surprising or unusual or unexpected if it has low probability, given what you think you know about the population in
question. The threshold for “low probability” can vary in different problems, but a typical choice is 5%.
When we ask whether a result is surprising (unusual, unexpected), we are really talking about that result or one even further
from the expected value.
You think you know that 30.4% of US households own cats. A sample of ten doesn’t seem very large; how do you decide whether seven successes
seems reasonable or unreasonable?
First, what’s the expected value? That’s µ = np = 10×.304 = 3.04.
Next, what does “that result or one further from the expected value” mean? The expected value is 3.04, seven is greater than 3.04, so we’re
talking about seven or more successes, x = 7 to 10.
Find the probability of that result or one even further from the expected value. That’s
easiest with MATH200A part 3: set n=10, p=.304, x=7 to 10. You can also do it with binomcdf:
seven or more successes is the complement of zero to six successes (0 1 2 3 4 5 6 7 8 9 10). Either way,
the probability is 0.0115 or just over 1%.
Draw your conclusion. If 30.4% of US households own cats, finding seven or more cat
houses in a random sample of ten households is unusual (surprising, unexpected).
That was a trivial example. But in real life, when a result is unexpected it can cast doubt on what you’ve been told. Here’s an example.
6E5. A Life-or-Death Example
Example 20: In Talladega County, Alabama, in 1962, an African American man named Robert Swain was accused of rape. 26% of eligible jurors in the
county were African American, but the 100-man jury panel for Swain’s trial included only 8 African Americans. (Through exemptions and
peremptory challenges, all were excluded from the final 12-man jury.) Swain was convicted and sentenced to death.
Swain’s lawyer appealed, on grounds of racial bias in jury selection. The Supreme Court ruled in 1965 that “The overall percentage disparity has
been small and reflects no studied a empt to include or exclude a specified number of blacks.”
—Adapted from Michailides [see “Sources Used” at end of book]
What do you think of that ruling? If 100 men in the county were randomly selected, is eight out of 100 in the jury pool unexpected (unusual,
surprising)?
Solution: This is a binomial model: every man in the county either is or is not African American, the sample size is a fixed 100, and in a random
sample there’s the same 26% chance that any given man is African American.
To determine whether eight in 100 is unexpected, ask what is expected. For binomial data, µ = np = 100×.26 = 26; in a sample of 100, you expect 26
African Americans.
Okay, 26 is expected, 8 is less than 26, “further away from expected” is less than 8, so you compute the probability for x = 0 to 8.
Use binomcdf(100,.26,8) or MATH200A part 3. Either way you get a probability of 4.734795002E-6, or about
0.000 005 , five chances in a million. That is unexpected. It’s so unlikely that we have to question the county’s claim that the
selection was random.
Unfortunately, Mr. Swain’s lawyer didn’t consult a statistician.
Concept of a random variable. (You don’t actually work problems about this.)
Discrete probability distribution (DPD or discrete PD), a/k/a probability model: list of possible outcomes with their probabilities.
µ and σ for a DPD: computing and interpreting.
Concept of Bernoulli trials: only two possible outcomes; p = probability of success and q = probability of failure.
Geometric model: definition, computing probabilities, computing and interpreting µ and σ. (This is a less important topic.)
Binomial probability distribution (BPD or binomial PD): definition. How do you know when you have a binomial model?
Computing probabilities in the binomial model. Use either binomcdf/binompdf or MATH200A, but MATH200A is less work.
µ and σ for a binomial PD: computing and interpreting. (You must use formulas to compute them.)
Determining whether an outcome is unusual or surprising.

please donate at
do it after all.
You roll five dice and count the number of twos that appear.
1 (a) List the possible values of the discrete random variable, X = “number of twos in five dice”.
(b) What type of probability model is appropriate? Why?
A lo ery has a 1 in 10 million chance of paying $10,000,000, a 1 in 125 chance of paying $100, and a 1 in 20 chance of paying $10. A ticket costs $5,
2 and you do not get that money back if you win a prize.
(a) Construct a discrete probability distribution.
(b) Is this a good deal or a bad deal for you? Explain.
Blood Types [see “Sources Used” at end of book] at the Stanford School of Medicine’s Web site lists the relative frequencies of blood types in the
3 US. (There’s also a nice chart of what blood types you can safely receive, based on your own blood type.) Only 6.6% of the US have O negative
blood.
Velma the Vampire will drink anything, but she prefers O negative. She doesn’t know a victim’s blood type until she tastes it.
(a) How many does she expect to drain before she gets some O negative?
(b) How likely is it that she’ll find her first O negative within her first ten victims?
(c) How likely is it that exactly two of her first ten victims will be O negative?
In January 2013, a CBS News story by Sarah Du on and others [see “Sources Used” at end of book] reported poll results: 92% of American adults
4 favored universal background checks for gun buyers.
(a) If TC3 students are representative of American adults when the poll was taken, what’s the chance that you’ll have to ask three TC3 students before
the third one opposes universal background checks?
(b) How likely is it that you’d find a student opposing universal background checks somewhere in the first three you ask, not necessarily in third
position?
Suppose 80% of students who register for Elizabethan Sonnets complete the course successfully.
5 (a) Imagine taking many, many samples of seven people, with replacement. What would be the expected number and standard deviation of the
number of people that would finish successfully, per sample of seven?
(b) At the end of the semester, imagine a random group of seven students who originally registered for the course. Find the probability that four to
six of them completed it successfully.
(c) What’s the chance that, when you ask each person in turn, the third person you ask is the first one who successfully completed the course?
(d) What’s the chance that the first person that you find who successfully completed the course is one of the first two you ask?
In a June 2013 poll, the Pew Research Center (2013b) [see “Sources Used” at end of book] found that 49% of American adults approved of
6 President Obama’s job performance. In a random sample of 40 American adults, taken at the same time, would you be surprised if 13 approved
his performance? Why or why not?
According to the Social Security Administration (2010) [see “Sources Used” at end of book], 0.1304% of 22-year-old males are expected to die in
7 the next year.
(a) What is the fair price of a $100,000 one-year term life insurance policy on a 22-year-old male? (To keep things simple, assume that the company
will charge the same price to every 22-year-old male, without regard to lifestyle or health factors.)
(b) The company actually charges $180.00 for this policy, more than the fair price. Is this unfair? Explain.
A coin is weighted — the chance of heads is not 50%. On five flips of that coin, the probability of various numbers of heads is shown by this
8 model:
x 0 1 2 3 4 5
P(x) 0.0778 0.2591 0.3456 0.2305 0.0768 0.0102

(a) Find and interpret the mean and standard deviation of this probability model.
(b) For an extra challenge, can you use your answer from part (a) to construct a simpler probability model for five flips of this coin?
Long experience shows that a particular drug will help 70% of the people who take it.
9 (a) If you take a random sample of five people, what is the probability that the drug helps at least three?
(b) If you take many samples of 10 people, what’s the average number of people per sample that the drug will help?
(c)In a random sample of 10 people, would you be surprised if the drug helps only five? Why or why not?
In April 2013, the Pew Research Center [see “Sources Used” at end of book] released poll results for the question “Which of the following best
10 describes how you feel about doing your taxes?” Surprisingly (to me, anyway), 34% said they like or love doing their taxes.
(a) How many Americans would you expect to have to ask to find one who likes or loves doing her taxes?
(b) If you ask five random Americans, what’s the probability that none of them will say they like or love doing their taxes?
In a sentence or two, write down the difference between the geometric and binomial models. (Write it, don’t just think it. It’s easy to tell
11 yourself you understand something, but the rubber meets the road when you have to put your understanding into words on paper.)
In a sentence or two, write down the difference between pdf and cdf.
12
7. Normal Distributions
Updated 1 Aug 2019
Summary: The normal distribution (ND) is important for two reasons. First, many natural and arti icial processes are ND. You’ll look at some of those
in this chapter. Second, any process can be treated as a ND through sampling. That will be the subject of Chapter 8, and it’s also the
foundation of the inferential statistics you’ll do in Chapters 9 through 11.
Contents: 7A. Continuous Random Variables

7A1. Density Curves
7A2. Probability and Continuous Distributions
· Area = Probability
· Two Interpretations of Probability
7B1. Properties of the Normal Distribution
7B2. From Boundaries, Find Probability
· Computing the Area
· Percentiles
7B3. From Probability, Find Boundaries
· Percentiles Again
7C1. “Normal” and “Standard Normal”
7C2. Applying the Standard Normal Distribution
7C3. The z Function (Critical z)
7D1. Checking Data Sets
7D2. Optional: How Normal Probability Plots Work
7A. Continuous Random Variables
You met random variables back in Chapter 6. Any random variable has a single numerical value, determined by chance, for each outcome of a
procedure. Discrete random variables are limited to specified values, usually whole numbers. But a continuous random variable can take any value
at all, within some interval or across all the real numbers.
Just as discrete probability models are used to model discrete variables, continuous probability models are used to model continuous variables. Of
course, because a continuous random variable has infinitely many possible values, you can’t make a table of values and probabilities as you could do
for a discrete distribution. Instead, either there’s an equation, or just a density curve (below).
A probability model is often called a distribution, so you can say that a variable “is normally distributed” (ND), that it “is a normal distribution”
(also ND), or that it “follows a normal probability model”.
There are lots of specialized continuous distributions, but the normal distribution is most important by a wide margin. Many, many real-life
processes follow the normal model, and the ND is also the key to most of our work in inferential statistics.
This section will give you some concepts that are common to all continuous distributions, and the rest of the chapter will talk about special properties
of the normal distribution and applications. In Chapter 8, you’ll apply the normal distribution to get a handle on the variation from one sample to the
next.
7A1. Density Curves
In Chapter 2, you learned to graph continuous data by grouping the data in classes and making a histogram, like the one below left. This is wait times
in a fast-food drive-through, with time in minutes — not whole minutes, which would make a discrete distribution, but minutes and fractional
minutes.
Any sample you might take has a finite number of data points, so you set up classes, place the data points in the classes, and then draw a
histogram. The height of each bar is proportional to the frequency or relative frequency of that class.
But when you come to consider all the possible values of a continuous variable, you have an infinite number of data points. If you tried to assign
them to classes, it would take you forever —literally! Instead, you draw a smooth curve, called a density curve, to show the possible values and how
likely they are to occur. An example is shown above right.
The density curve is a picture of a continuous probability model. It doesn’t just represent the data in a particular sample, but all possible data
for that variable — along with the probabilities of their occurrence, as you’ll see next.
7A2. Probability and Continuous Distributions
Up to now, the height of a bar in a histogram has been the number of data points in that class, or the
relative frequency of that class. But how do you interpret the height of a density curve?
Answer: you don’t! The height of the curve above any particular point on the x axis just doesn’t
lend itself to a simple interpretation. You might think it would be the probability of that value
occurring. But with infinitely many possible values, “what’s the likelihood of a wait time of exactly 4
minutes?” just isn’t a meaningful question, because what about 3.99997 minutes or 4.002 minutes?
Area = Probability
What is meaningful is the probability within an interval, which equals the area under the curve
within that interval. For example, in this illustration, the probability of a wait time of 6.4 to 9.5
minutes is 29.4%. In symbols,
P(6.4 ≤ x ≤ 9.5) = 29.4%
or
P(6.4 < x < 9.5) = 29.4%
That’s right — the probability is the same whether you include or exclude the endpoints of the interval.
BTW: Okay, I lied. The height of the curve is meaningful, but only if you’ve had some calculus. The curve is the graph of a probability density function or pdf. The integral
of that curve from a to b is the area between x=a and x=b and is the probability that the random variable will have a value between a and b.
This explains why the probability is the same whether you include or exclude either endpoint of the interval. The difference is the area of a “rectangle”
whose height is the height of the density curve and whose width is the distance from a to a — which is zero. Thus the area of the “rectangle” is zero, and the
probability of the random variable taking any particular value, exactly, is zero.
Since area equals probability, and total probability must be 1, total area must be 1. Every pdf — the height of every density curve — is scaled so that the
integral from −∞ to +∞ is 1.
You can also have the probability for an interval with one boundary, < or ≤ some value like the
picture at right, or > or ≥ some value. For example, 3.33 minutes is about 3 minutes and 20 seconds, so
the probability of waiting up to 3 minutes and 20 seconds is 20.6%: P(x ≤ 3.33) = 20.6%.
The total area under any density curve equals the probability that the random variable will take any
one of its possible values, which of course is 1, or 100%. So you can use the complement to say that
the probability of waiting 3 minutes and 20 seconds or more (or, more than 3 minutes and 20 seconds)
is 100−20.6% = 79.4%.
Two Interpretations of Probability
You remember from Interpreting Probability Statements in Chapter 5 that every probability can be
interpreted as a probability of one or a proportion of all. For example, P(x > 3.33) = 79.4% can equally well be interpreted in two ways:
Probability of one: “Any randomly selected person has a 79.4% chance of waiting more than 3 minutes and 20 seconds.”
Proportion of all: “79.4% of people will wait more than 3 minutes and 20 seconds.”
Which interpretation you use in a given situation depends on what seems simplest and most natural in the situation. Here, the “proportion of all”
interpretation seems simpler. But you’re always free to switch to the other interpretation if it helps you in thinking about a situation.
Area = Probability of One = Proportion of All
Why study the normal distribution?
First, it’s useful on its own. Lots and lots of real-life distributions match the normal model: body temperature or blood pressure of healthy people,
scores on most standardized tests, commute times on a given route, lifetimes of ba eries or light bulbs, heights of men or women, weights of apples
of a particular variety, measurement errors (in many situations), and on and on.
Second, through sampling, even non-ND populations follow a normal model. You’ll use this model in inferential statistics to make statements
about a whole population based on just one sample — look forward to learming this neat trick in Chapter 8.
BTW: Why is the ND so common? In real life, very few events have just one cause; most things are the result of many factors operating independently. It turns out that if you
take a lot of independent random variables and add them up, their sum is ND. For example, your IQ score results from multiple genetic factors, countless occurrences in
your education and your family life, even transient factors like how well you slept the night before the test. Most of these are independent of each other, so the result of adding
them is a ND.
7B1. Properties of the Normal Distribution
The normal distribution (ND) has the properties of other continuous distributions as listed
earlier. In particular, area = probability, and the total area under the density curve is the total
probability, which is 1. The ND also has these special properties:
A ND is completely described by its mean and SD. The mean locates the center of the
curve, but has no effect on the shape. For example, here are three normal curves with µ = 0, 2,
and 5 and σ = 4.
The standard deviation determines the shape of the curve, but has no effect on the location.
Smaller SD means the data stick closer to the mean, so the peak is higher and the tails are
shorter and fa er. Larger SD means the data vary more, so they spread out from the mean:
the peak is lower and the tails are longer and thinner. The second picture shows are three
normal curves with µ = 2 and σ = 2, 4, and 6. (The vertical scale is different from the first
picture.)
The ND is symmetric — left and right sides are mirror images of each other. This implies
that the mean, median and mode are all equal.
In principle, the tails of the normal curve run out to ±∞. However, data points more than 3
standard deviations from the mean are rare. (This is part of the Empirical Rule from
Chapter 3.)
The books all say that inflection points are one SD above and below the mean. Inflection
points, if you haven’t had calculus, are where the curve transitions between concave up and
concave down. The books don’t tell you that those points are far from obvious visually. Just
do the best you can when making sketches.
All of this is the theoretical normal distribution. In fact, nothing in real life is perfectly ND, because nothing in real life has an infinite number of
data points. When we say something is ND, we mean it’s a close match, not a perfect match. “Normally distributed” (or ND) is short for “using a
normal distribution to model this data set, the calculations will come out close enough to reality.”
This is a lot like what you did in Chapter 3, when you computed the statistics of a grouped distribution. The statistics were only approximate,
because of the simplification you introduced by grouping, but the approximation was good enough.
Now let’s get to some applications! There are two main categories: “forward” problems, where you have the boundaries and you have to find the
area or probability, and “backward” problems, where you have a probability or area and you have to find the boundaries.
BTW: Who invented the normal distribution? Abraham de Moivre (1667–1754, French) was probably first, in 1733, though several other mathematicians contributed.
Wikipedia [URL h ps://en.wikipedia.org/wiki/Normal_distribution#Development accessed 2019-08-01] has a decent short summary of the history. In
Jenny Kenkle’s talk [URL h ps://www.math.utah.edu/~kenkel/normaldistributiontalk.pdf accessed 2019-08-01] on the normal distribution, slide 18 shows de
Moivre’s approximation to the binomial distribution [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c06_BinomDist] for large n, and how to get from
that to the ND. And if you want an exhaustive treatment of the history, see Saul Stahl’s The Evolution of the Normal Distribution [URL
h ps://www.maa.org/sites/default/files/pdf/upload_library/22/Allendoerfer/stahl96.pdf accessed 2019-08-01], originally published in Mathematics
magazine, April 2006.
The name of Carl Friedrich Gauss [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Gauss] is permanently coupled to the normal
distribution — literally. Although Sir Francis Galton coined the term normal distribution in 1889, Karl Pearson [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#bign_PearsonK] called it the Gaussian distribution in 1905, and that’s still a recognized synonym.
BTW: In case you’re interested, the pdf, the height of the density curve above a given x, is .
The cdf, the area to the left of a given x, is the integral of that, just the same as finding the area under any curve to the left of a given x:
. This integral doesn’t have a “closed form”, a finite sequence of basic algebraic operations, so it must be found by
successive approximations. That’s what your calculator does with normalcdf and Excel does with NORM.DIST.
7B2. From Boundaries, Find Probability
Summary: Make a sketch, estimate the probability (area), then compute it.
TI-83/84/89: Use normalcdf(left bound, right bound, mean, SD). I’ll walk you through the TI-83/84 keystrokes in the first example below. If you have
a TI-89, press [CATALOG] [F3] [plain 6 makes N] [ENTER].
Excel: In Excel 2010 or later, use (deep breath here) =NORM.DIST(right bound, mean, SD, TRUE) − NORM.DIST(left bound, mean, SD, TRUE). In
Excel 2007 or earlier, it’s NORMDIST rather than NORM.DIST.
Example 1: Heights of human children of a given age and sex are ND. One study found that three-year-old girls’ heights have a mean of 38.72″ and
SD of 3.17″. What percentage of three-year-old girls are 35″ to 40″ tall?
Solution: Take the time to make a sketch. It doesn’t have to be beautiful, but you should make it as accurate as you reasonably can. It’s an important
safeguard against making boneheaded mistakes. Here’s what should be on your sketch:
1. Draw the axis line.
2. Label the axis, x or z as appropriate. x is the symbol for real-world data points, and z is
the symbol for z-scores in the standard normal distribution, below.
3. Draw a vertical line in the middle of the distribution and write the numerical value of
the mean below the axis where that central line meets it. (If necessary, offset it with a tick
mark, as I did.)
4. Draw a horizontal line at about the right spot and show the numerical value of the
standard deviation.
5. Draw a line and show the value for each boundary.
Important: When you marked the SD, you set the scale for the sketch. Now you
have to honor that and place your boundaries in proportion. For instance, in this
problem the mean is 38.72 and the left boundary is 35, which is 3.72 below the mean.
Your left boundary therefore needs to be a bit more than one SD (3.17) left of the mean. The right bound is 40, which is 1.28 above the mean,
so your line needs to be just over a third of a SD to the right of the mean.
(Students often put in more numbers and lines, like the values of 1, 2, and 3 SD above and below the mean. That’s not wrong, but it’s
usually not helpful, and it definitely clu ers up the sketch.)
6. Shade the area you’re trying to find.
7. Look at your sketch and estimate the area before you pull out your calculator. That way, if you make a mistake that leads to a ridiculous
answer, you’ll recognize it as ridiculous and fix it.
From my sketch, I estimate an area of 50%–60%. If it’s 45% or 70% I won’t be terribly surprised, but if it’s 5% or 99% I’ll know something
is wrong.
8. Compute the area (below).
If you wish, add that number to your sketch — not below the axis, please. Write it within the shaded area, if there’s room, or as a callout
to the left or right of the diagram, the way I did here.
Computing the Area
On a TI-83 or TI-84, press [2nd VARS makes DISTR] [2] to select normalcdf. Enter the left boundary (35), right boundary (40), mean (38.72), and SD (3.17).
(If you have a TI-89 or you’re using Excel, see above.)
After entering the standard deviation, press [)] [ENTER] to get the answer.
You always need to show your work, so write down normalcdf(35,40,38.72,3.17) before you proceed to the answer. (There’s no need to write
down the keystrokes you used.)
In this book, I round probabilities to four decimal places, or two decimal places if expressed as a percentage. The probability is
P(35 ≤ x ≤ 40) = 0.5365
That number matches my estimate of 50%–60%.
But the problem asked for a percentage. (Always, always, always look back at the problem and make sure you’re answering the question that
was actually asked.) The answer: 53.65% of three-year-old girls are 35″ to 40″ tall.
Example 2: A three-year-old girl is randomly chosen. Would it be unusual (unexpected, surprising) if she’s over 45″ tall?
In Chapter 5 you learned to call a low-probability event unusual (a/k/a surprising or

unexpected). The standard definition of unusual events is a probability below 0.05, so really
this problem is just asking you to find the probability and compare it to 0.05.
Solution: The sketch is at right, and obviously the probability should be small. The left
boundary is 45, but what’s the right boundary? The normal distribution never quite ends, so
the right boundary is ∞ (infinity). TI-89s have a key for ∞, but TI-83s and TI-84s don’t and Excel
doesn’t, so use 10^99 instead. (That’s 10 to the 99th power; the [^] key on your TI calculator is
between [CLEAR] and [÷].)
Show your work:

P(x > 45) = normalcdf(45,10^99,38.72,3.17) = 0.0238
That’s rounded from 0.0237914986, and it’s in line with my estimate of “small”. Now answer the question: There’s only a 2.38% chance that a
randomly selected three-year-old girl will be over 45″ tall, so that would be unusual.
Example 3: For the same population, find and interpret P(x < 33).
Solution: The sketch is at right, and again the expected probability is small. The right
boundary is 33, but what’s the left boundary? You might want to use 0, since no one can be
under 0″ tall, but you could make the same argument for 1″ or 5″, so that can’t be right.
To locate the left boundary, remember that you’re using a normal model to approximate
the data, and the normal distribution runs right out to ±∞. Therefore, the left boundary is
minus ∞ on a TI-89, or minus 10^99 on a TI-83/84. (Use the [(-)] key, not the [−] subtraction
key.)
P(x < 33) = normalcdf(-10^99,33,38.72,3.17) = 0.0356
The proportion of three-year-old girls under 33″ tall is 0.0356 or 3.56%; or, 3.56% of three-year-
old girls are under 33″ tall. The other interpretation is the chance that a randomly selected
three-year-old girl is under 33″ tall is 0.0356 or 3.56%.
Percentiles
Example 4: What’s the percentile rank of a three-year-old girl who is 33″ tall?
Solution: Long ago, in a galaxy called Numbers about Numbers, you learned the definition of
percentiles. The percentile rank of a data point is the percentage of the data set that is ≤ that data point. Because this textbook helps you,
So you need P(x ≤ 33). But that’s exactly what you computed in the previous example: 3.56%. So the please donate at
33″-tall girl is between the third and fourth percentiles for her age group. BrownMath.com/donate.
“That was P(x < 33), and for a percentile I need P(x ≤ 33)!” I hear you yell. But those two are equal.
When we talked about density curves, near the beginning of this chapter, you learned that the area
and probability are the same whether you include or exclude the boundary.
And this is why it doesn’t make much difference whether you define a percentile rank in terms of < or ≤, because the probability in a continuous
distribution is the same either way.
7B3. From Probability, Find Boundaries
Summary: Make a sketch, estimate the value(s), then compute the value(s).
TI-83/84/89: Use invNorm(area to left, mean, SD). I’ll walk you through the TI-83/84 keystrokes in the first example below. If you have a TI-89, press
[CATALOG] [F3] [plain 9 makes I] [▼ 3 times] [ENTER].
Excel: In Excel 2010 or later, use =NORM.INV(area to left, mean, SD). In Excel 2007 or earlier, it’s NORMINV rather than NORM.INV.
Example 5: Blood pressure is stated as two numbers, systolic over diastolic. The World Health Organization’s MONICA Project (Kuulasmaa 1998 [see
“Sources Used” at end of book]) reported these parameters for the US:
Systolic: µ = 120, σ = 15
Diastolic: µ = 75, σ = 11
Blood pressure in the population is normally distributed. The lowest 5% is considered “hypotensive”, according to Kuzma and Bohnenblust (2005,
103) [see “Sources Used” at end of book]. What systolic blood pressure would be considered hypotensive?
Solution: Always make a sketch for these problems. Your sketch is similar to the ones you made for the first group of problems, except that you use
a symbol like x1 or “?” for the unknown boundary, and you write in the known area.
Always estimate your answer to guard against at least some errors. In the sketch, x1 looks like it’s not quite two SD left of the mean, so I’ll
estimate a pressure of 95 to 100. (Okay, I cheated by using my calculator to make my “sketch”. But even with a real pencil-and-paper sketch, you
ought to be in the right ballpark.)
Now you’re ready to calculate. TI-89 or Excel users, please see the instructions above. On your TI-83 or TI-84, press
[2nd VARS makes DISTR] [3] to select invNorm. Enter the area to the left of the point you’re interested in (.05), the mean
(120), and the SD (15).
After the standard deviation, press [)] [ENTER] to get the answer.
Show your work! Write down invNorm(.05,120,15) before you proceed to the answer. (There’s no need to write down the keystrokes you used.)
Answer: Systolic blood pressure (first number) under 95 would be considered hypotensive.
Example 6: The same source considers the top 5% “hypertensive”. What is the minimum systolic blood pressure that is hypertensive?
Solution: My “sketch” is at right. It’s mostly straightforward — the x1 boundary is between the 5% tail and the rest
of the distribution.
But what’s up with the 1−0.05? The problem asks you about the upper 5%, which is the area to the right of the
unknown boundary. But invNorm on the calculator, and NORM.INV in Excel, need area to left of the desired
boundary. The area to the left is the probability of “not hypertensive”, and area is probability, so the area to left is 1
minus the area to right, in this case 1−0.05.
Could you just write down 0.95? Sure, that would be correct. But if the area to right was 0.1627 you’d probably
make the calculator compute 1 minus that for you, so why not be consistent?
x1 = invNorm(1−.05,120,15) = 144.6728044 → 145
(That’s actually a li le liberal. Several sources that I’ve seen give 140 as the threshold.)
Example 7: Kuzma and Bohnenblust describe the middle 80% as “normal”. What is that range of systolic blood
pressure?
This problem wants you to find two boundaries, lower and upper. You have to convert the 80% middle into two
areas to left. Here’s how. If the middle is 80%, then the two tails combined must be 100−80% = 20%. But the curve is
symmetric, so each tail must be 20/2 = 10%. Strictly speaking, I probably should have wri en that computation on
the diagram, instead of just a laconic “0.1”, but it would take up a lot of space and the computation was easy
enough. You’ll probably do the same — just be careful.
Once you have the areas squared away, the computation is simple enough:
x1 = invNorm(.1,120,15) = 100.7767265 → 101
x2 = invNorm(1−.1,120,15) = 139.2232735 → 139
Check: The boundaries of the middle 80% (or the middle any percent) should be equal distances from the mean. (100.776265+139.2232735)/2 = 120, so
at least it’s consistent. Answer: Systolic b.p. of 101 to 139 is considered normal.
Percentiles Again
Example 8: What’s the 40th percentile for systolic blood pressure?
Sometimes the gods smile on us. The kth percentile is the value that is ≥ k% of the population, so k% is exactly the
area to left that you need.
P40 = invNorm(.4,120,15) = 116.1997935 → 116
Definition: The standard normal distribution is a normal distribution with a mean of 0 and standard deviation of 1, sometimes wri en N(0,1).
The standard normal distribution is a picture of z-scores of any possible real-world ND — more about that later.
The standard normal distribution lets you make computations that apply to all normal models, not just a particular model. You’ll see some examples
shortly, but first —
7C1. “Normal” and “Standard Normal”
The main point about the standard normal distribution is that it’s a stand-in for every ND from real life. How does this work? Well, if you take any
real data set and subtract the mean from every data point, the mean of the new data set is 0. And if you then divide that data set by the standard
deviation (which doesn’t change when you subtract a constant from every data point), then the SD of the new-new data set is 1.
But all you did with those manipulations was replace the numbers with z-scores. Remember the formula: . The standard normal
distribution is what you get when you convert any normal model to z-scores.
BTW: Long ago, when dinosaurs ruled the earth — okay, up through the early 1980s — a “computer” was a person who used a slide rule to make computations. (I swear I am
not making this up.) There were no statistical calculators and no Excel. The only way for most people to make computations on a normal model was to look up probabilities in
printed tables. But obviously a book couldn’t print tables for every normal model. So the printed tables were for the standard normal distribution. If you had boundaries and
wanted the probability of the interval, you converted your real-world numbers to z-scores, looked up the probabilities in the table, and subtracted them. If you had a probability
and needed a boundary, you looked up the z-score in the table and then converted it to a raw score using the mean and SD of your data set.
The need to do normal computations the hard way has gone the way of the dinosaurs, but I think this history is why many stats books still use tables to
do their computations. Inertia is a powerful force in textbooks!
BTW: The pdf and cdf functions for the standard normal distribution are what you get when you set µ=0 and σ=1 in the general equations for the ND:
and . Again, the integral must be found by successive approximations. That’s where the tables in books come
from, and it’s what your calculator does with normalcdf and Excel does with NORM.DIST.
7C2. Applying the Standard Normal Distribution
I said above that the standard normal distribution lets you make statements about all normal models. What sort of statements? Well, the Empirical
Rule for one.
Example 9: The Empirical Rule says that 68% of the population in a normal model lies within one SD of the mean. How good is the rule? In other
words, what’s the actual proportion?
Solution: As usual, you start with a sketch. This is the standard ND, so the axis is z, not x. There’s no need to mark
the mean or SD, because the z label identifies this as a standard normal distribution and therefore µ = 0 and σ = 1.
Just label the boundaries.
Compute the probability the same way you’ve already learned. (Both Excel and the TIs have special procedures
available for the standard normal distribution, but it’s not worth taking brain cells to learn them, when the regular
procedures for the ND work just fine with N(0,1).)
P(−1 ≤ z ≤ 1) = normalcdf(−1,1,0,1) = .6826894809 → 68.27%
The Empirical Rule says 68% of the data are within z = ±1. Actually it’s about 68¼%, close enough.
Example 10: How many standard deviations must you go above and below the mean to take in the middle 50% of the data in a normal model?
Solution: This is similar to finding the middle 80% of blood pressures earlier, except now you’re making a
statement about all normal models, not just a particular one.
Shading the middle 50% leaves 100−50 = 50% in the two tails combined, so each tail is 50/2 = 25%.
z1 = invNorm(.25,0,1) = −.6744897495 → −0.67
By symmetry, z2 must be numerically equal to z1 but have the opposite sign: z2 = 0.67.
50% of the data in any normal model are within about 2/3 of a SD of the mean. Since the bounds of the
middle 50% of the data are Q1 and Q3, the IQR of any normal distribution is twice that, about one and a third
standard deviations. More precisely, the IQR is 2×0.674 ≈ 1.35 times the SD.
7C3. The z Function (Critical z)
There’s one special notation you’ll use when you compute confidence intervals in Chapter 9.
Definition: zarea or z(area), also known as critical z, is the z-score that divides the standard normal distribution such that the right-hand tail has
the indicated area.
This may seem a li le weird, but really it’s just a recipe to specify a number. Compare with the square root of 48. That is the positive number such
that, if you multiply it by itself, you get 48. Or consider π: the number that you get when you divide the circumference of a perfect circle by its
diameter. Math is full of numbers that are specified as recipes. An example will make things clearer.
Example 11: Find z0.025.
Solution: The problem is diagrammed at right. Caution! 0.025 is an area, not a z-score, so you don’t write 0.025 on
the number line (the z axis). z0.025 is a z-score (though you don’t know its value yet), so it goes on the number line.
Once you have your sketch, the computation is straightforward. Have area (probability), compute boundary.
The area is 0.025, but it’s an area to right, and invNorm needs an area to left, so you subtract from 1 as usual:
z0.025 = invNorm(1−.025, 0, 1) = 1.959963986 → 1.96
Caution! You’re computing a boundary for the right-hand tail. If you get a negative number, that can’t possibly be
right.
z0.025 = 1.96 makes sense, if you think about it. If you also shaded in the left-hand tail with an area of 0.025, the two tails together would total 5%,
leaving 95% in the middle. The Empirical Rule says that 95% of data are within 2 SD above and below the mean, and 1.96 is approximately 2.
How do you know whether a normal model is appropriate? How do you know whether your data are normally distributed? A histogram can rule
out skewed data, or data with more than one peak.
But what if your data are unimodal and not obviously skewed? Is that enough to justify a normal model? No, it’s not. You need to perform a test
called a normal probability plot. You’ll need this procedure in Chapters 8 through 11, whenever you have a small sample of numeric data.
Summary: To check whether a normal model can represent your sample, make a normal probability plot. This plots the actual data points, against the z-
scores you would expect for this number of points that are ND. If the plot is close to a straight line, a normal model is appropriate; if the plot is
far from a straight line, a normal model is not appropriate.
That’s the bare outline, and you’ll get a li le bit more with the examples. For those who want the full theory, it’s marked optional
at the end of this section.
Technology: Testing for normality can be automated partly or completely, depending on what technology you have:
On a TI-83/84, you have two choices: Normality Check on TI-83/84 [URL: h ps://BrownMath.com/ti83/normchek.htm], or the
MATH200A program (shown below). I strongly recommend the program, not just because I wrote it ☺ but because it saves you a
lot of work. See Ge ing the Program [URL: h ps://BrownMath.com/ti83/math200a.htm#Download].
On a TI-89, you have to do the plot and the computations yourself. See the step-by-step procedure in Normality Check on TI-89 [URL:
h ps://BrownMath.com/ti83/nchk89.htm].
There’s an Excel workbook that does everything described here, and even includes a second test of normality. See Normality Check
and Finding Outliers in Excel [URL: h ps://BrownMath.com/stat/nchkxl.htm].
7D1. Checking Data Sets
Example 12: Consider these vehicle weights (in pounds):

2500, 3250, 4000, 3500, 2900, 4500, 3800, 3000, 5000, 2200
Do they fit a normal model?
Solution: Put the data in any statistics list, then press [PRGM], scroll down to MATH200A, and press [ENTER] twice.
Select Normality chk.
The program makes the plot, and you can look at the points to determine whether they seem to be pre y
much on a straight line. At least, that’s the theory. In practice, most data sets are a lot less clear cut than this one. It
can be hard to tell whether the points fit a line, particularly if you have only a few of them. The plot takes up the
whole screen, so deviations can look bigger than they really are.
Fortunately, there’s a test for whether points lie on a straight line. As you know from Chapter 4, the closer the
correlation coefficient r is to 1, the closer the points are to a straight line.
The program computes r for you, and it also computes a critical value★ to help you determine if the points
are close enough to a straight line. (For technical reasons, the critical value is different from the decision points of Chapter 4.) If r≥crit, it’s close
enough to 1, the points are close enough to a straight line, and you can use a normal model. If r<crit, it’s too far from 1, the points are too far from a
straight line, and you can’t use a normal model.
For this data set, r > crit, and therefore these vehicle weights fit the normal model.
★The “classic TI-83” (non-“Plus” model) doesn’t compute the critical value, so you have to do it yourself. See the formula in item 4 in the next
section.
Example 13: Here’s a random sample of the lengths (in seconds) of tunes in my iTunes library:
120 219 242 134 129 105 275 76 412 268

486 199 651 291 126 210 151 98 100 92
305 231 734 468 410 313 644 117 451 375
Do they fit a normal model?
Solution: I entered them in a statistics list and then ran MATH200A Program part 4. The result was the plot at the
right.
You can see that the plot is curved. This is reinforced by comparing r=0.9473 to crit=0.9639. r < crit. The
points diverge too far from a straight line, and therefore I cannot use a normal model for the lengths of my
iTunes songs.
7D2. Optional: How Normal Probability Plots Work
The basic idea isn’t too bad. You make an xy sca erplot where the x’s are the data points, sorted in ascending order, and the y’s are the expected z-
scores for a normal distribution.
Why would you expect that to be a straight line? Recall the formula for a z-score: z = (x−x̅)/s. Breaking the one fraction into two, you have
z = x/s−x̅/s. That’s just a linear equation, with slope 1/s and intercept x̅/s. So an xz plot of any theoretical ND, plo ing each data point’s z-score against
the actual data value, would be a straight line.
Further, if your actual data points are ND, then their actual z-scores will match their expected-for-a-normal-distribution z-scores, and therefore a
sca erplot of expected z-scores against actual data values will also be a straight line.
Now, in real life no data set is ever exactly a ND, so you won’t ever see a perfectly straight line. Instead, you say that the closer the points are to a
straight line, the closer the data set is to normal. If the data points are too far from a straight line — if their correlation coefficient r is lower than some
critical value — then you reject the idea that the data set is ND.
Okay, so you have to plot the data points against what their z-scores should be if this is a ND, and specifically for a sample of n points from a ND,
where n is your sample size. This must be built up in a sequence of steps:
1. Divide the normal curve (mentally) into n regions of equal probability and take one probability from each region. For technical reasons,
the probability number you use for region i is (i−.375)/(n+.25). This formula is in many textbooks, and also in Normal Probability Plots and
Tests for Normality (Ryan and Joiner 1976 [see “Sources Used” at end of book]).
2. Compute the expected z-scores for those probabilities. Working with the calculator, that’s just invNorm of (i−.375)/(n+.25).
3. Plot those expected z-scores against the data values. This xy plot (or xz plot) has a correlation coefficient r, computed just like any other
correlation coefficient.
4. Compare the r for your data set to the critical value for the size of your data set. Ryan and Joiner determined that the critical value for
sample size n, at the 0.05 significance level,, is 1.0063 − .1288/√n − .6118/n + 1.3505/n². To make it a li le easier on the calculator I rearranged it
as 1.0063 − .6118/n + 1.3505/n² − .1288/√n.
BTW: In the same paper, they gave formulas for critical values at other significance levels:
1.0071 − 0.1371/√n − 0.3682/n + 0.7780/n² at α=0.10
0.9963 − 0.0211/√n − 1.4106/n + 3.1791/n² at α=0.01
The closer the points are to a straight line, the closer the data set is to fi ing a normal model. In other words, a larger r indicates a ND, and a smaller r
indicates a non-ND. You can draw one of two conclusions:
If r is less than the critical value, reject the hypothesis of normality at the 0.05 significance level and say that the data set is not ND.
(If you haven’t studied hypothesis testing yet, another way to say it is that you’re pre y sure the data set doesn’t fit the normal model
because there’s less than a 5% probability that it does.)
If r is greater than the critical value, fail to reject the hypothesis that the data set comes from a ND.
This doesn’t mean you are certain it does, merely that you can’t rule it out. Technically you don’t know either way, but practically it
doesn’t ma er. Remember (or you will learn later) that inferential statistics procedures like t tests are robust, meaning that they still work
even if the data are moderately non-normal. But if your data were extremely non-normal, r would be less than the critical value. When r is
greater than the critical value, you don’t know whether the data set comes from normal data or moderately non-normal data, but either way
your inferential statistics procedures are okay.
So the bo om line is, if r > CRIT, treat the data as normal, and if r < CRIT, don’t.
BTW: The normal probability plot is just one of many possible ways to determine whether a data set fits the normal model. Another method, the D’Agostino-Pearson test, uses
numerical measures of the shape of a data set called skewness and kurtosis to test for normality. For details, see Assessing Normality [URL:
h ps://BrownMath.com/stat/shape.htm#Normal] in Measures of Shape: Skewness and Kurtosis.
Properties of the ND.
Sketching the ND.
Area = proportion of all = probability of one.
“Forward problem”: have boundary(ies), find area or probability or proportion. Sketch and use normalcdf.
“Backward problem”: have area, find boundaries (values of the ND). Sketch and use invNorm. That function needs area to left, so if
the problem gives area to right you have to use 1 minus that area.
For problems involving percentiles (also here), remember that k% of the area is to the left of the kth %ile.
Standard ND and the zarea notation. In this notation, area is area to right, so you need invNorm(1−area).
Determining whether a data set is ND. Use MATH200A Program part 4.

please donate at
do it after all.
You’ll need this information for several of the problems:

US men’s heights: ND with µ = 69.3″, σ = 2.92″
US women’s heights: ND with µ = 64.1″, σ = 2.75″
Source: “Is Human Height Bimodal?” (Schilling 2002 [see “Sources Used” at end of book]).
Suppose that variable X is Chantal’s commute time between home and school, in minutes. Give two interpretations of the statement
1 P(x < 17) = 0.0900.
A male co-worker is “six foot four and a half” — 76.5″ tall. How unusual is that? (Give two interpretations of your number.)
2
What proportion of women are 64″ to 67″ tall?
3
What heights for men would be considered unusual (less than 5% likely)? Hint: Your answer will be in the form “under ____ inches or over ____
4 inches”.
To enter the Pennsyltuckey Police Academy, you have to be at or above the 15th percentile in height. How tall is that, for a man?
5
(a) Find the 25th and 75th %iles for women’s heights.
6 (b) Find the interquartile range.
(c) Example 10 found that, in a normal distribution, the interquartile range equals 1.35 standard deviations. Does your computed IQR match that
prediction?
Determine whether this sample of diastolic blood pressures fits the normal model:
7 78 66 98 90 74 70 70 76 72 86 62 84 66 70 68
Scores on the math SAT are ND with a mean of 500 and standard deviation of 100. What percentile is represented by a score of 735?
8
To join Mensa, you must be in the top 2% of the population on a recognized intelligence test. Mensa accepts the SAT as a qualifying test for
9 membership. The mean on the combined three parts is 1500 and the SD is 300. What’s the minimum combined score to qualify you for Mensa?
Find z0.01.
10
For men’s heights, find P(x < 60″) and write two interpretations.
11
Test scores are supposed to be ND, but this is questionable on small tests. Here are scores from a recent quiz; do they fit the normal model?
12 0.3 8.8 11.5 12 12.3 12.5 13 13.5 14.8
A small shop decided to stock formal wear for men and women in the middle 90% of height. How tall must men and women be to shop
13 there?
8. How Samples Vary

Updated 29 Oct 2020
Intro: Inferential statistics says, “I’ve got this sample. What does it tell me about the population it came from?” Eventually, you’ll estimate a
population mean or proportion from a sample and use a sample to test a claim about a population. In essence, you’re reasoning
backward from known sample to unknown population. But how? This chapter lays the groundwork.
First you have to reason forward. Way back in Chapter 1, you learned that samples vary because no one sample perfectly
represents its population. In this chapter, you’ll put some numbers on that variation. You’ll learn about sampling distributions, and
you’ll calculate the likelihood of ge ing a particular sample from a known population. That will be the basis for all your inferential
statistics, starting in Chapter 9.
Contents: 8A. Numeric Data / Means of Samples

8A1. One Sample and Its Mean
8A2. Meet the Sampling Distribution of x̅
8A3. Properties of the Sampling Distribution of x̅
· There’s an App for That
· Center of the Sampling Distribution of x̅
· Spread of the Sampling Distribution of x̅
· Shape of the Sampling Distribution of x̅
· Requirements, Assumptions, and Conditions
8A4. Applications
· How to Work Problems
· Example 1: Bank Deposits
· Example 2: Women’s Heights
· Example 3: Elevator Load Limit
8B1. Sampling Distribution of p̂
· Center of the Sampling Distribution of p̂
· Spread of the Sampling Distribution of p̂
· Shape of the Sampling Distribution of p̂
· Requirements, Assumptions, and Conditions
8B2. Applications
· How to Work Problems
· Example 5: Swain v. Alabama
Acknowledgements: The approach I take to this material was suggested by What Is a p-Value Anyway? (Vickers 2010, ch 10 [see “Sources Used” at end
of book]), though of course any problems with this chapter are my responsibility and not Vickers’.
The software used to prepare most of the graphs and all of the simulations for this chapter is @RISK from Palisade Corporation [URL
h p://www.palisade.com/risk/ accessed 2014-10-05].
8A. Numeric Data / Means of Samples
8A1. One Sample and Its Mean
Having time on my hands, I was curious about the lengths of tunes in the Apple Store. Being lazy, I decided to look instead at the Lengths of
lengths of tunes in my iTunes library. There are 10113 of them, and I’m going to assume that they are representative. (That’s my 30 Tunes
story, and I’m sticking to it.) mm:ss seconds
I set Shuffle to Songs and then took the first 30, which gave me the times you see at right for a random sample of size 30. 2:00 120
Here is a histogram of the data. The tune times are moderately skewed right. That makes sense: most tunes run around two to 3:39 219
five minutes, but a few are longer. 4:02 242
2:14 134
2:09 129
1:45 105
4:35 275
1:16 76
6:52 412
4:28 268
8:06 486
3:19 199
10:51 651
4:51 291
2:06 126
3:30 210
2:31 151
1:38 98
1:40 100
1:32 92
5:05 305
3:51 231
12:14 734
7:48 468
The mean of this sample is 280.9 seconds, and the standard deviation is 181.7 seconds. But you know that there’s always sampling
6:50 410
error [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c01_ErrorsSampling]. No sample can represent the population perfectly, so
5:13 313
if you take another sample from the same population you’d expect to see a different mean, but not very different. This chapter is all
10:44 644
about what differences you should expect.
1:57 117
First, ask yourself: Why should you expect the mean of a second sample to be “different, but not very different” from the mean of 7:31 451
the first sample? The samples are independent, so why should they relate to each other at all? 6:15 375
Answer: because they come from the same population. In a given sample, you would naturally expect some data points below the population
mean µ, and others above µ. You’d expect that the points below µ and the points above µ would more or less cancel each other out, so that the mean
of a sample should be in the neighborhood of µ, the mean of the population.
And if you think a li le further about it, you’ll probably imagine that this canceling effect works be er for larger samples. If you have a sample
of four data points, you wouldn’t be much surprised if they’re all above µ or all below µ. If you have a sample of 100 data points, having them all on
one side of µ would surprise you as much as flipping a coin 100 times and ge ing 100 heads. So you expect that the means of large samples tend to
stick closer to µ than the means of small samples do. That’s absolutely true, as you’ll find out in this chapter.
To get a handle on “different, but not very different”, take a look at a second sample of 30 from the same population. This one has x̅ = 349.1, s = 204.2
seconds. From its histogram, you can see it’s a bit more strongly skewed than the first sample.
The two sample means differ by 349.14−280.93 ≈ 68.2 seconds. That might seem like a lot, but it’s only about a quarter of the first sample mean and
under a fifth of the second sample mean. Also, it’s a lot less than the standard deviations of the two samples, meaning that the difference between
samples is much less than the variability within samples.
There’s an element of hand waving in that paragraph. Sure, it seems plausible that the two sample means are “different, but not very different”;
but you could just as well construct an argument in words that the two means are different. Without numbers to go on, how much of a difference is
reasonable? In statistics, we like to use numbers to decide whether a thing is reasonable or not. How can we make a numerical argument about the
difference between samples? Well, put on your thinking cap, because I’m about to blow your mind.
8A2. Meet the Sampling Distribution of x̅
The key to sample variability is the sampling distribution.
De inition: Imagine you take a whole lot of samples, each sample with n data points, and you compute the sample mean x̅ of each of them. All those x̅’s form
a new data set, which can be called the distribution of sample means, or the sampling distribution of the mean, or the sampling
distribution of x̅, for sample size n.
Notice that n is the size of each sample, not the number of samples. There’s no symbol for the number of samples, because it’s
indefinitely large.
The sampling distribution is a new level of abstraction. It exists only in our minds: nobody ever takes a whole lot of samples of the same size from a
given population. You can think of the sampling distribution as a “what if?” — if you took a whole lot of samples of a given size from the same
population, and computed the means of all those samples, and then took those means as a new set of data for a histogram, what would that
distribution look like?
Why ask such an abstract question? Simply this: if you know how samples from a known population are distributed, you can work backward
from a single sample to make some estimates about an unknown population. In this chapter, I work from a population of tunes with known mean
and standard deviation, and I ask what distribution of sample means I can expect to get. In the following chapters, I’ll turn that around: looking at
one sample, we’ll ask what that says about the mean and standard deviation of the population that the sample came from.
What does a sampling distribution look like? Well, I used a computer simulation with @RISK from Palisade Corporation [URL
h p://www.palisade.com/risk/ accessed 2014-10-05] to take a thousand samples of 30 tunes each — the same n as before — and this is what I got:
“Big whoop!” I hear you say. I agree, it’s not too impressive at first glance. But let’s compare this distribution of sample means to the population
those samples come from.
(In real life, you wouldn’t know what the population looks like. But in this chapter I work from a known population to explore what the
distribution of its samples looks like. Starting in the next chapter, I’ll turn that around and use one sample to explore what the population probably
looks like.)
Look at the two histograms below. The left-hand plot shows the individual lengths of all the tunes in the population — it’s a histogram of the original
population. The right-hand plot shows the means of a whole lot of samples, 30 tunes per sample — it’s a histogram of the sampling distribution of the
mean. That right-hand plot is the same as the plot I showed you a couple of paragraphs above, just rescaled to match the left-hand plot for easier
comparison.
Now, what can you see?
Shape: The original population is skewed strongly to the right, but the sampling distribution is nearly a bell curve. (The shape is easier to
see if you look at the first picture of the sampling distribution. Remember, the right-hand plot and the earlier plot are the same plot, just
drawn on different scales.)
Center: The mean of the sampling distribution is 296.9 seconds, the same as the mean of the population.
Spread: Individual tune lengths (original population, left graph) vary quite a lot, but means of 30-tune samples (sampling distribution of x̅,
right graph) vary much less. You can say that most individual tune lengths are a lot shorter or longer than the population average, but most
mean lengths in samples of 30 are very close to the population average. Compare these measures of spread from the two graphs:
Population Sampling
(indiv. tunes) Distribution
Values 50 to 1000s(*) 200 to 400
Middle 95% of values 98.0 to 696.3 244.6 to 359.1
Standard deviation 158.6 29.0
(*) I cut off the right tail of the population graph to save space.
At this point, you’re probably wondering if similar things are true for other numeric populations. The answer is a resounding YES.
8A3. Properties of the Sampling Distribution of x̅
When you describe a distribution of continuous data, you give the center, spread, and shape. Let’s look at those in some detail, because this will be
key to everything you do in inferential statistics.
There’s an App for That
Before I get into the properties of the sampling distribution, I’d like to tell you about two Web apps that let you play with sampling distributions in
real time. (I’m grateful to Benjamin Kirk for suggesting these.)
Sampling Distributions [URL h p://onlinestatbook.com/stat_sim/sampling_dist/index.html accessed 2014-10-03], part of the Rice Virtual Lab
in Statistics. This app lets you sample from symmetric and skewed distributions, at various sample sizes, and see how the sampling
distribution builds up. The app plots the sampling distribution and calculates its mean and SD, so you can compare them to the original
population and also to the expected center, spread, and shape described below.
CentLimApplet [URL h p://www.intuitor.com/statistics/CLAppClasses/CentLimApplet.htm accessed 2014-10-03]. This shows you why

“sample size at least 30 or so” is a good rule of thumb for numeric data. Try se ing the number of samples to the maximum, then increase the
sample size one unit at a time, and you’ll see how the sampling distribution gets closer and closer to a ND.
If you possibly can, try out these apps, especially the second one. Sampling distributions are new and strange to you, and playing with them in real
time will really help you to understand the text that follows.
Center of the Sampling Distribution of x̅
Summary: The mean of the sampling distribution of x̅ equals the mean of the population: µx̅ = µ.
This is true regardless of the shape of the original population and regardless of sample size.
Why is this true? Well, you already know that when you take a sample, usually you have some data points that are higher than the population mean
and some that are lower. Usually the highs and lows come pre y close to canceling each other out, so the mean of each sample is close to µ — closer
than the individual data points, that is.
When you take a distribution of sample means, the same thing happens at the second level. Some of the sample means x̅ are above µ and some
are below. The highs and lows tend to cancel, so the average of the averages is pre y darn close to the population mean.
Spread of the Sampling Distribution of x̅
Summary: The standard deviation of the sampling distribution of x̅ has a special name: standard error of the mean or SEM; its symbol is σx̅. The
standard error of the mean for sample size n equals the standard deviation of the population divided by the square root of n: SEM
or σx̅ = σ/√n.
This is true regardless of the shape of the original population and regardless of sample size.
BTW: Why is this true? Each member of the sample is a random variable, all drawn from the same population with a SD of σ and therefore a variance of σ². If you combine
random variables — independent random variables — their variances add.
Okay, the sample is n random values drawn from a population with a variance of σ². The total of those n values in the sample is a random variable with a
variance of σ²n, and therefore the standard deviation of the total is √σ²n = σ√n. Now divide the sample total by n to get the sample mean. x̅ is a random
variable with a standard deviation of (σ√n)/n = σ/√n. QED — which is Latin for “told ya so!”
Shape of the Sampling Distribution of x̅
Summary: If the original population is normally distributed (ND), the sampling distribution of the mean is ND. If the original population is not
ND, still the sampling distribution is nearly ND if sample size is ≥ 30 or so but not more than about 10% of population size.
You can probably see that if you take a bunch of samples from a ND population and compute their means, the sample means will be ND also. But
why should the means of samples from a skewed population be ND as well?
The answer should be called the Fundamental Theorem of Statistics, but instead it’s called the Central Limit Theorem. (The name was given by
Richard Martin Edler von Mises in a 1919 article, but the theorem itself is due to the Marquis de Laplace [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Laplace], in his Théorie analytique des probabilités [1812].) The CLT is the only theorem in this whole
course. There is a mathematical way to state and prove it, but we’ll go for just a conceptual understanding.
Central The sampling distribution of the mean approaches the normal distribution, and does so more closely at larger sample sizes.
Limit An equivalent form of the theorem says that if you take a selection of independent random variables, and add up their values, the
Theorem: more independent variables there are, the closer their sum will be to a ND.
The second form of the theorem explains why so many real-life distributions are bell curves: Most things don’t have a single cause, but many
independent causes.
Example: Lots of independent variables affect when you leave the house and your travel time every day. That means that any person’s commute
times are ND, and so are people’s arrival times at an event. The same sorts of variables affect when buses arrive, so wait times are ND. Most things in
nature have their growth rate affected by a lot of independent variables, so most things in nature are ND.
But it’s the first form of the theorem that we’ll use in this chapter. If samples are randomly chosen, or chosen by another valid sampling technique
[URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c01_goodbad_root], then they will be independent and the Central Limit Theorem will apply.
The further the population is from a ND, the bigger the sample you need to take advantage of the CLT. Be careful! It’s size of each sample that
ma ers, not number of samples. The number of samples is always large but unspecified, since the sampling distribution is just a construct in our
heads. As a rule of thumb, n=30 is enough for most populations in real life. And if the population is close to normal (symmetric, with most data near
the middle), you can get away with smaller samples.
On the other hand, the sample can’t be too large. For samples drawn without replacement (which is most samples), the sample shouldn’t be
more than about 10% of the population. In symbols, n ≤ 0.1N. Suppose you don’t know the population size, N? Multiply left and right by 10 and
rewrite the requirement as 10n ≤ N. You always know the sample size, and if you can make a case that the population is at least ten times that size
then you’re good to go.
You’ll remember that the population of tune times was highly skewed, but the sampling distribution for n=30 was pre y nearly bell shaped. To show
how larger sample size moves the sampling distribution closer to normal, I ran some simulations of 1000 samples for some other sample sizes.
Remember that the sampling distribution is an indefinitely large number of samples; you’re still seeing some lumpiness because I ran only 1000
samples in each simulation.
The means of 3-tune samples are still fairly well skewed, though the range is less than the population range. Increasing sample size to 10, the skew is
already much less. 20-tune samples are pre y close to a bell curve except for the extreme right-hand tail. Finally, with a sample size of 100, we’re darn
close to a bell curve. Yes, there’s still some lumpiness, but that’s because the histogram contains only 1000 sample means.
Requirements, Assumptions, and Conditions
The requirements mentioned in this chapter will be your “ticket of admission” to everything you do in the rest of the course. If you don’t check the
requirements, the calculator will happily calculate numbers for you, they’ll be completely bogus, and your conclusions will be wrong but you won’t
know it. Always check the requirements for any type of inference before you perform the calculations.
I talk about “requirements”. By now you’ve probably noticed that I think very highly of DeVeaux, Velleman, and Bock’s Intro Stats (2009) [see
“Sources Used” at end of book]. They test the same things in practice, but they talk about “assumptions” and “conditions”. Assumptions are things
that must be true for inference to work, and conditions are ways that you test those assumptions in practice.
You might like their approach be er. It’s the same content, just a different way of looking at it. And sampling distributions are so weird and
abstract that the more ways you can look at them the be er! Following DeVeaux pages 591–593, here’s another way to think about the requirements.
Independence Assumption: Always look at the overall situation and try to see if there’s any way that different members of the sample can affect each
other. If they seem to be independent, you’ll then test these conditions:
Randomization Condition: Was the sample randomly selected? (A proper systematic sample counts as random.) Later, when you do
inference on two samples in Chapter 11, you’ll ask instead whether the participants were randomly assigned to treatments.
10% Condition: If the population is small, a decent-sized sample may be too large. Remember, back in Chapter 5, you learned that sampling
without replacement changes the mix of what’s left? In practice, if the sample is less than about 10% of the population, the effect is not
serious enough to worry about.
These conditions must always be met, but they’re a supplement to the Independence Assumption, not a substitute for it. If you can see any way in
which individuals are not independent, it’s game over regardless of the conditions.
Normal Population Assumption: For numeric data, the sampling distribution must be ND or you’re dead in the water. There are two conditions to
check this:
Nearly Normal Condition: If the sample is small, check for normality as you learned in Chapter 7. This ma ers because skewed data and
outliers can distort the sample mean and SD.
Large Sample Condition: But if the sample is larger, more than about 30, outliers and skew have less effect on the mean and SD, and you
don’t have to worry about the Nearly Normal Condition.
The Normal Population Assumption and the Nearly Normal Condition or Large Sample Condition are for numeric data and only numeric data. We’ll
have a separate set of requirements, assumptions, and conditions for binomial data later in this chapter.
See also: Is That an Assumption or a Condition? is a very nice summary by Bock [see “Sources Used” at end of book] of all assumptions and
conditions. It puts all of our requirements for all procedures into context. (Just ignore the language directed at instructors.)
8A4. Applications
Ultimately, you’ll use sampling distributions to estimate the population mean or proportion from one
sample, or to test claims about a population. That’s the next four chapters, covering confidence Because this textbook helps you,
intervals and hypothesis tests. But before that, you can still do some useful computations. please donate at
How to Work Problems
For all problems involving sampling distributions and probability of samples, follow these steps:
1. Determine center, spread, and shape of the sampling distribution, even if you’re not explicitly asked to describe the distribution.
2. If you can’t show that the sampling distribution is ND, stop!
3. Sketch the curve, and estimate the answer. (See examples below.)
4. Compute the probability (area) using normalcdf. Caution! Don’t use rounded numbers in this calculation.
Example 1: Bank Deposits
You are auditing a bank. The bank managers have told you that the average cash deposit is $200.00, with standard deviation $45.00. You plan to take
a random sample of 50 cash deposits. (a) Describe the distribution of sample means for n = 50. (b) Assuming the given parameters are correct, how
likely is a sample mean of $189.56 or below?
Solution (a): Recall that describing a distribution means giving its center, its spread, and its shape.
Center: The mean of the sampling distribution equals the mean of the original population: µx̅ = µ, so µx̅ = $200.00 . This does not depend on
whether the sampling distribution is normal.
Spread: The standard deviation of the sampling distribution of the mean, be er known as the standard error of the mean, is σx̅ = σ/√n = 45/√50 and
σx̅ = $6.36 . This does not depend on whether the sampling distribution is normal.
Shape: The sample was random, and 10n = 10×50 = 500 is obviously less than the number of cash deposits at any bank. Sample size 50 is ≥30, so the
sampling distribution of the mean is near enough to a normal model . (If n was much under 30, you would be unable to say anything about the
shape and you would be unable to solve part (b).)
Solution (b): Please refer to How to Work Problems, above. You’ve already described the distribution, so the next step is to make the sketch. You may
be tempted to skip this step, but it’s an important reality check on the numerical answer you get from your calculator.
The sketch for this problem is shown at right. Please observe these key points when sketching sampling distribution problems:
1. Draw the axis line.
2. Label the axis, x̅ or p̂ as appropriate.
3. Draw a vertical line in the middle of the distribution and show the numerical value of the mean.
Caution! This is the mean of the sampling distribution, equal to the population mean, not the sample
mean.
4. Draw a horizontal line at about the right spot and show the numerical value of the SEM, not σ of the
original population. (For Binomial Data, below, you’ll use the SEP instead of the SEM.)
5. Draw a line and show the value for each boundary.
6. Shade the area you’re trying to find, and estimate it. (From the sketch for this problem, I estimated a
few percent, definitely under 10%.)
7. (optional) After you find the area, show its value.
Next, compute the probability on your calculator.

Press [2nd VARS makes DISTR] [2] to select normalcdf. Fill in the arguments, either on the wizard interface or in the function itself. Either way, you
need four arguments, in this order:
Left boundary. In this case, there is no left boundary because the problem specifies ≤$189.56. Conceptually, the boundary is −∞, but your calculator
doesn’t have an infinity key, so use (−)10^99 instead. (Don’t use 0. Yes, 0 is the lower limit for a deposit, but you’re using the normal model for the
sampling distribution, so the tails go on forever.)
Right boundary. For this problem, 189.56 is the right boundary.
Mean. 200 in this problem.
Standard error. You computed it earlier as $6.36, but that’s an approximate number. Never use rounded numbers in further calculations.
normalcdf calculations are particularly sensitive to rounding errors, especially when one or both boundaries are out in the tails, so use the exact
value: 45/√50.
The wizard prompts you for a standard deviation σ. Don’t enter the SD of the After entering the standard error, press [)] [ENTER]. You’ll have
population. Do enter the SD of the sampling distribution, which is the two closing parentheses, one for the square root and one for
standard error. normalcdf.
After entering the standard error, press [ENTER] twice and your screen will
look like the one at right.
Always show your work. There’s no need to write down all your keystrokes, but do write down the function and its arguments:
normalcdf(−10^99, 189.56, 200, 45/√50)
Answer: P(x̅ ≤ 189.56) = 0.0505
Comment: Here you see the power of sampling. With a standard deviation of $45.00, an individual deposit of $189.56 or lower can be expected almost
41% of the time. But a sample mean under $189.56 with n=50 is much less likely, only a li le over 5%.
BTW: This is one reason you should take the trouble to make your sketch reasonably close to scale. If you enter the standard deviation, 45, instead of the standard error, 45/
√50 — a common mistake — you’ll get 0.4083. A glance at your sketch will tell you that can’t possibly be right, so you then know to find and fix your mistake.
Example 2: Women’s Heights
US women’s heights are normally distributed (ND), with mean 65.5″ and standard deviation 2.5″. You visit a small club on a Thursday evening, and
25 women are there. (Let’s assume they are a representative sample.) Your pickup line is that you’re a statistics student and you need to measure their
heights for class. Amazingly, this works, and you get all 25 heights. How likely is it that the average height is between 65″ and 66″?
Solution: First, get the characteristics of the sampling distribution:
Center: The mean of the sampling distribution is 65.5″, the same as the mean of the original population.
Spread: The standard deviation of the sampling distribution (standard error of the mean or SEM) is σ/√n = 2.5/√25 = 0.5″.
Shape: The sample is representative of all women (we assume), and 10n = 10×25 = 250 is less than the total number of women. The sample size is
under 30, but the original population is a ND and therefore the sampling distribution is also ND.
If the SEM is 0.5″, then 65″ and 66″ equal the mean ± one standard error. The Empirical Rule (68–95–99.7
Rule) tells you that about 68% of the data fall between those bounds. In this problem, the sketch is a really
good guide to the answer you expect.
BTW: This is the distribution of sample means, so you expect 68% of them to fall between those bounds. But do the
computation anyway, because the Empirical Rule is approximate and now you’re able to be precise. Also, the SEM of 0.5″
is an exact number, but still I put the whole computation into the calculator just to be consistent.
The chance that the sample mean is between 65″ and 66″ is
P(65 ≤ x̅ ≤ 66) = 0.6827
BTW: Remember the difference between the distribution of sample means and the distribution of individual heights. From the computation
at the right, you expect to see under 16% of women’s heights between 65″ and 66″, versus over 68% of sample mean heights (for n=25)
between 65″ and 66″. That’s the whole point of this chapter: sample means stick much closer to the population mean.
Example 3: Elevator Load Limit
Suppose hotel guests who take elevators weigh on average 150 pounds with standard deviation of 35 pounds. An engineer is designing a large
elevator, to lift 50 people. If she designs it to lift 4 tons (8000 pounds), what is the chance a random group of 50 people will overload it?
Need a hint? This is a problem in sample total. You haven’t studied that kind of problem, but you have studied problems in sample means. In
math, when you have an unfamiliar type of problem, it’s always good to ask: Can I change this into some type of problem I do know how to solve?
In this case, how do you change a problem about the total number of pounds in a sample (∑x) into a problem about the average number of pounds
per person (x̅)?
Please stop and think about that before you read further.
Solution: To convert a problem in sums into a problem in averages, divide by the sample size. If the total weight of a sample of 50 people is 8000 lb,
then the average weight of the 50 people in the sample is 8000/50 = 160 lb. So the desired probability is P(x̅ > 160):
P(∑x > 8000 for n = 50) = P(x̅ > 160)
And you know how to find the second one.
What does the sampling distribution of the mean look like for µ = 150, σ = 35, n = 50? The mean is µx̅ = 150 lb,
and the standard error is 35/√50 ≈ 4.9 lb. That’s all you need to draw the sketch at right. Samples are random,
10×50 = 500 is less than the number of people (or potential hotel guests) in the world, and n = 50 ≥ 30, so the
sampling distribution follows a normal model.
Now make your calculation. This time the left boundary is a definite number and the right boundary is pseudo
infinity, 10^99. And again, you want the standard error, not the SD of the original population.
After entering the standard error, press [)]

[ENTER].
After entering the standard error, press [ENTER] twice, and your screen will look like the one at
right.
Show your work: normalcdf(160, 10^99, 150, 35/√50).

There’s a 0.0217 chance that any given load of 50 people will overload the elevator. That’s not 2% of all loads, but 2% of loads of 50 people. Still, it’s
an unacceptable level of risk.
Is there an inconsistency here? Back in Chapter 5, I said that an unusual event was one that had a low probability of occurring, typically under 5%.
Since 2% is less than 5%, doesn’t that mean that an overloaded elevator is an unusual event, and therefore it can be ignored?
Yes, it’s unusual. But no, fifty people plunging to a terrible death can’t be ignored. The issue is acceptable risk. Yes, there’s some risk any time
you step in an elevator that it will be your last journey. But it’s a small risk, and it’s one you’re willing to accept. (The risk is much greater every time
you get into a car.) Without knowing exact figures, you can be sure it’s much, much less than 2%; otherwise every big city would see many elevator
deaths every day.
In Chapter 10, you’ll meet the significance level, which is essentially the risk of being wrong that you can live with. The worse the consequences
of being wrong, the lower the acceptable risk. With an elevator, 5% is much too risky — you want crashes to be a lot more unusual than that.
Binomial data are yes/no or success/failure data. Each sample yields a count of successes. (A reminder: “success” isn’t necessarily good; it’s just the
name for the condition or response that you’re counting, and the other one is called “failure”.)
Need a refresher on the binomial model? Please refer back to Chapter 6 [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c06_BinomDist].
The summary statistic or parameter is a proportion, rather than a mean. In fact, the proportion of success (p) is all there is to know about a
binomial population.
In Chapter 6 you computed probabilities of specific numbers of successes. Now you’ll look more at the proportions of success in all possible samples
from a binomial population, using the normal distribution (ND) as an approximation.
Here’s a reminder of the symbols used with binomial data:
p The proportion in the population. Example: If 83% of US households have at least one cell phone, then p = 0.83.
Remember “proportion of all equals probability of one”, so p is also the probability that any randomly selected response from the population
will be a success.
q = 1−p is therefore the proportion of failure or the chance that any given response will be a failure.
n The sample size.
x The number of successes in the sample. Example: if 45 households in your sample have at least one cell phone, then x = 45.
p̂ “p-hat”, the proportion in the sample, equal to x/n. Example: If you survey 60 households and 45 of them have at least one cell phone, then p̂ =
45/60 = 0.75 or 75%.
8B1. Sampling Distribution of p̂
The sampling distribution of the proportion is the same idea as the sampling distribution of the mean, and there are a lot of parallels between the
two. (A table at the end of this chapter summarizes them.)
De inition: Imagine you take a whole lot of samples from the same population. Each sample has n success/failure data points, and you compute the sample
proportion p̂ of each of them. All those p̂’s form a new data set, which can be called the distribution of sample proportions, or the sampling
distribution of the proportion, or the sampling distribution of p̂, for sample size n.
As before, n is the size of each sample, not the number of samples. There’s no symbol for the number of samples, because it’s
indefinitely large.
One change from the sampling distribution of x̅ is that the sampling distribution of p̂ is a different data type from the population. The original data
are non-numeric (yeses and noes), but the distribution of p̂ is numeric because the p̂’s are numbers. Each p̂ says “so many percent of this sample were
successes.”
Center of the Sampling Distribution of p̂
Summary: The mean of the sampling distribution of p̂ equals the proportion of the population: µp̂ = p (“mu sub p-hat equals p”).
This is true regardless of the proportion in the original population and regardless of sample size.
Why is this true? The reasons are similar to the reasons in Center of the Sampling Distribution of x̅. p̂ for a given sample may be higher or lower than
p of the population, but if you take a whole lot of samples then the high and low p̂’s will tend to cancel each other out, more or less.
Spread of the Sampling Distribution of p̂
Summary: The standard deviation of the sampling distribution of p̂ has a special name: standard error of the proportion or SEP; its symbol is σp̂
(“sigma-sub-p-hat”). The standard error of the proportion for sample size n equals the square root of the population proportion,
times 1 minus the population proportion, divided by the sample size: SEP or σp̂ = √pq/n.
This is true regardless of the proportion in the original population and regardless of sample size.
BTW: Why is this true? For a binomial distribution with sample size n, the standard deviation is √npq. That is the SD of the random variable x, the number of successes in a
sample of size n. The sample proportion, random variable p̂, is x divided by n, and therefore the SD of p̂ is the SD of random variable x, also divided by n. In symbols, σp̂ =
√npq / n = √npq/n² = √pq/n.
Shape of the Sampling Distribution of p̂
Summary: If np and nq are both ≥ about 10, and 10n ≤ N, the normal model is good enough for the sampling distribution.
Let’s look at some sampling distributions of p̂. First I’ll show you the effect of the population’s
proportion of success p, and then the effect of the sample size n.
Using @RISK from Palisade Corporation [URL h p://www.palisade.com/risk/ accessed 2014-10-
05], I simulated all of the sampling distributions shown here. The mathematical sampling distribution
has an indefinitely large number of samples, but I stopped at 10,000.
These first three graphs show the sampling distributions for samples of size n = 4 from three
populations with different proportions of successes.
Reminder: these are not graphs of the population — they’re not responses from individuals. They
are graphs of the sampling distributions, showing the proportion of successes (p̂) found in a lot of
samples.
How do you read these? For example, look at the first graph. This shows the sampling distribution of
the proportion for a whole lot of samples, each of size 4, where the probability of success on any one
individual is 0.1. You can see that about 67% of all samples have p̂ = 0 (no successes out of four), about
29% have p̂ = .25 (one success out of four), about 4% have p̂ = .50 (two successes out of four), and so
on.
Why the large gaps between the bars? With n = 4, each sample can have only 0, 1, 2, 3, or 4
successes, so the only possible proportions for those samples are 0, 25%, 50%, 75%, and 100%.
But let’s not obsess over the details of these graphs. I’m more interested in the shapes of the sampling
distributions.
What do you see? If you take many samples of size 4 from a population with p = 0.1 (10%
successes and 90% failures), the sampling distribution of the proportion is highly skewed. Now look
at the second graph. When p = .25 (25% successes and 75% failures in the population), again with n = 4
individuals in each sample, the sample proportions are still skewed, but less so. And in the third
graph, where the population has p = 0.5 (success and failure equally likely), then the sampling
distribution is symmetric even with these small samples.
For a given sample size n, it looks like the closer the population p is to 0.5, the closer the sampling
distribution is to symmetric. And in fact that’s true. That’s your take-away from these three graphs.
Now let’s look at sampling distributions using different sample sizes from the same population. I’ll
use a population with 10% probability of success for each individual (p = 0.1).
You’ve already seen the graph of the sampling distribution when n = 4. The three graphs here show
the sampling distribution of p̂ for progressively larger samples. (Remember always that n is the
number of individuals in one sample. The number of samples is indefinitely large, though in fact I took
10,000 samples for each graph.)
What do you see here? The distribution of p̂’s from samples of 50 individuals is still noticeably
skewed, though a lot less than the graph for n = 4. If I take samples of size 100, the graph is starting to
look nearly symmetric, though still slightly skewed. And if I take samples of 500 individuals, the
distribution of p̂ looks like a bell curve.
What do you conclude from these graphs? First, even if p is far from 0.5 (if the population is quite
unbalanced), with large enough samples, the sampling distribution of p̂ is a normal distribution.
Second, you need big samples for binomial data. Remember that 30 is usually good enough for
numeric data. For binomial data, it looks like you need bigger samples.
Okay, let’s put it together. If the size of each sample is large enough, the sampling distribution is close enough to normal. How large a sample is
large enough? It depends on how skewed the original population is, which means it depends on the proportion of successes in the population. The
further p is from 0.5, the more unbalanced the population and the larger n must be.
How big a sample is big enough? Here’s what some authors say:
DeVeaux, Velleman, Bock (2009, 462) [see “Sources Used” at end of book]: np ≥ 10 and nq ≥ 10.
Johnson and Kuby (2004, 432) [see “Sources Used” at end of book]: n > 20, np > 5, and nq > 5.
Bulmer (1979, 120) [see “Sources Used” at end of book]: npq ≥ 2.
The Math Forum (2003) [see “Sources Used” at end of book]: np > 5 and nq > 5.
Sullivan (2011, 437) [see “Sources Used” at end of book]: npq ≥ 10.
Why the disagreements? They can’t all be right, can they?
Actually, they can. The question is, what’s close enough to a ND? That’s a judgment call, and different statisticians are a li le bit more or less strict
about what they consider close enough. Fortunately, with samples bigger than a hundred or so, which are customary, all the conditions are usually
met with room to spare.
We’ll follow DeVeaux and his buddies and use np ≥ 10 and nq ≥ 10. This is easy to remember: at least ten “yes” and at least ten “no” expected in a
sample. (You can compute the expected number of noes as nq = n(1−p) or simply n−np, sample size minus the expected number of yeses.)
How does this work out in practice? Look at the next-to-last graph, with n=100 and p=0.1. It’s close to a bell curve, but has just a li le bit of skew.
(It’s easier to see the skew if you cover the top part of the graph.)
Check the requirements: np = 100×.1 = 10, and nq = 100−10 = 90. In a sample of 100, 10 successes and 90 failures are expected, on average. This just
meets requirements. And that matches the graph: you can see that it’s not a perfect bell curve, but close; but if it was a li le more skewed then the
normal model wouldn’t be a good enough fit.
BTW: De Veaux and friends (page 440) give a nice justification for choosing ≥ 10 yeses and noes. Briefly, the ND has tails that go out to ±infinity, but proportions are between
0 and 1. They chose their “success/failure condition”, at least ten of each, so that the mismatch between the binomial model and the normal model is only in the rare cases.
But there’s an additional condition: the individuals in the sample must be independent. This translates to a requirement that the sample can’t be too
large, or drawing without replacement would break the binomial model. Big surprise (not!): Authors disagree about this too. For example, De Veaux
and Johnson & Kuby say sample can’t be bigger than 10% of population (n ≤ 0.1N); Sullivan says 5%.
We’ll use n ≤ 0.1N, just like with numeric data. And just as before, you can think of that as 10n ≤ N when you don’t have the exact size of the
population.
Example 4: You asked 300 randomly selected adult residents of Ithaca a yes-or-no question. Is the sample too large to assume independence? You
may not know the population of Ithaca, but you can compute 10×300 = 3000 and be confident that there are more than 3000 adult Ithacans. Therefore
your sample is not too large.
Don’t just claim 10n ≤ N. Show the computation, and identify the population you’re referring to, like this: “10n = 10×300 = 3000 ≤ number of adult
Ithacans.”
Remember to check your conditions: np ≥ about 10, nq ≥ about 10, and 10n ≤ N. And of course your sample must be random.
Requirements, Assumptions, and Conditions
Just like with numeric data, you might find it helpful to name the requirements for binomial data. These are the same requirements that I just gave
you, but presented differently. I’m following DeVeaux, Velleman, Bock (2009, 493) [see “Sources Used” at end of book].
Independence Condition, Randomization Condition, 10% Condition: These are the same for every sampling distribution and every procedure in
inferential stats. I’ve already talked about them under numeric data earlier in the chapter [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c08_IndependenceAssump]. In practice, the 10% Condition comes into play more often for binomial
data than numeric data, because binomial samples are usually much larger.
Sample Size Assumption: For binomial data, the sample is like Goldilocks and porridge — it can’t be too big and it can’t be too small. (Maybe it was
beds or chairs and not porridge? And what the heck is porridge?) “Too big” is checked by the 10% Condition; “too small” is checked by the
Success/Failure Condition: The more lopsided the population is — the further the population proportion is from 50% — the larger sample
you need for the sampling distribution to be close enough to a normal model. Our rule of thumb is that your sample needs to be big enough
that you expect ≥ 10 successes and ≥ 10 failures based on the population proportion p.
See also: Is That an Assumption or a Condition? (Bock [see “Sources Used” at end of book]). Again, these are the same requirements you see in this
textbook, just presented differently.
8B2. Applications
How to Work Problems
Working with the sampling distribution of p̂, the technique is exactly the same as for problems involving the sampling distribution of x̅. Follow these
steps:
1. Determine center, spread, and shape of the sampling distribution, even if you’re not explicitly asked to describe the distribution.
2. If you can’t show that the sampling distribution is ND, stop!
3. Sketch the curve, and estimate the answer. (See example below.)
4. Compute the probability (area) using normalcdf. Caution! Don’t use rounded numbers in this calculation.
BTW: The ND is continuous and goes out to ±infinity, but the binomial distribution is discrete and bounded by 0 and n. If the requirements are met (at least 10 successes and
10 failures expected), the normal model is a good fit near the middle of the distribution. The fit is usually good enough in the tails, but not as good as it is in the middle.
Because of this, some authors apply a continuity correction to make the normal model a be er fit for the binomial. This means extending the range by
half a unit in each direction. For example, if n = 100 and p = 0.20, and you’re finding the probability of 10 to 15 successes, MATH200A part 3 gives a
probability of 0.1262. The normal model with standard error √.20×(1−.20)/100 = 0.04 gives
normalcdf(10/100, 15/100, .2, .04) = 0.0994
With the continuity correction, you compute the probability for 9½ to 15½ successes. Then the normal model gives a probability of
normalcdf(9.5/100, 15.5/100, .2, .04) = 0.1260
This is a be er match to the exact binomial probability. Why use the normal model at all, then? Why not just compute the exact binomial probability?
Because there’s only a noticeable discrepancy far from the center, and only when the sample is on the small side. (100 is a small sample for binomial data, as
you’ll see in the next two chapters.) You can apply the continuity correction if you want, but many authors don’t because it usually doesn’t make enough
difference to ma er.
Example 5: Swain v. Alabama
1965. Talladega County, Alabama. An African American man named Robert Swain is accused of rape. The 100-man jury panel includes 8 African
Americans, but through exemptions and peremptory challenges none are on the final jury. Swain is convicted and sentenced to death. (Juries are all
male. 26% of men in the county are African American.)
(a) In a county that is 26% African American, is it unexpected to get a 100-man jury panel with only eight African Americans?
Solution: “Unexpected”, “unusual”, or “surprising” describes an event with low probability, typically under 5%. This problem is asking you to find
the probability of ge ing that sample and compare that probability to 0.05.
This is binomial data: each member of the sample either is or is not African American. Pu ing the problem into the language of the sampling
distribution, your population proportion is p = 0.26. You’re asked to find the probability of ge ing 8 or fewer successes in a sample of 100, so n = 100
and your sample proportion must be p̂ ≤ 8/100 or p̂ ≤ 0.08.
Why “8 or fewer”? p̂ = 8% for the questionable sample, and you think it’s too low since it’s below the expected 26%. Therefore, in determining
how likely or unlikely such a sample is, you’ll compute the probability of p̂ ≤ 0.08.
First, describe the sampling distribution of the proportion:

Center: µp̂ = p = 0.26
Spread: σp̂ = √pq/n = √.26(1−.26)/100 = 0.044
Shape: For the sampling distribution to be normal, you have four requirements to check:
Random sample? That’s the county’s claim. Check.
Sample not too large? 10n = 10×100 = 1000. We don’t know how many men are in the county, but it must be more than a thousand. Check.
Expected number of successes? np = 100×.26 = 26 ≥ 10. Check. (Use 0.26, not 0.08. The sampling distribution is based on the population
proportion p, not on any particular sample proportion.)
Expected number of failures? nq = 100−26 = 74 ≥ 10. Check.
Therefore the sampling distribution of the proportion is a ND.
Next, make the sketch and estimate the answer. I’ve numbered the key points in the sketch at right, but if you
need a refresher please refer back to the sketch under Example 1.
From this sketch, you’d expect the probability to be very small, and indeed it is.
Compute the probability using normalcdf as before. Be careful with the last argument, which is the
standard deviation of the sampling distribution. Don’t use a rounded number for the standard error, because it
can make a large difference in the probability.
The standard error expression, √.26*(1−.26)/100, scrolls off the Press [)] [ENTER] after entering the standard error. You’ll have two closing
screen as you type it in, so be extra careful! parentheses, one for the square root and one for normalcdf.
Press [ENTER] twice, and your screen will look like the one at
right.
Always show your work — not keystrokes but the function and its arguments:
normalcdf(−10^99, .08, .26. √(.26*(1−.26)/100))
BTW: The SEP is a nasty expression, and you have to enter it twice in every problem. You might like to save some keystrokes by computing
it once and then storing it in a variable, as I did at the right. When you’re drawing the sketch and need the standard error, compute it as
usual but before pressing [ENTER] press [STO→] [x,T,θ,n]. Then when you need the standard error in normalcdf, in the wizard or the
classic interface, just press the [x,T,θ,n] key instead of re-entering the whole SEP expression. The probability is naturally the same whether
you use the shortcut or not.
P(p̂ ≤ 0.08) = 2.0×10-5, or P(x ≤ 8) = 0.000 020. There are only 20 chances in a million of ge ing a 100-man jury pool with so few African Americans by
random selection from that county’s population. This is highly unexpected — so unlikely that it raises the gravest doubts about the county’s claim
that jury pools were selected without racial bias.
BTW: You might remember that in Chapter 6 you computed this binomial probability as 0.000 005, five chances in a million. If the ND is a good approximation, why does it
give a probability that’s four times the correct probability? Answer: The normal approximation gets a li le dicier as you move further out the tails, and this sample is pre y far
out (z = −4.10). But is the approximation really that bad? Sure, the relative error is large, but the absolute error is only 0.000 015, 15 chances in a million. Either way, the
message is “This is extremely unlikely to be the product of random chance.”
(b) From 1950 to 1965, as cited in the Supreme Court’s decision, every 100-man jury pool in the county had 15 or
fewer African Americans. How likely is that, if they were randomly selected?
Solution: 15 out of 100 is 15%. You know how to compute the probability that one jury pool would be ≤15%
African American, so start with that. You’ve already described the sampling distribution, so all you have to do is
make the sketch and then the calculation. Everything’s the same, except your right boundary is 0.15 instead of
0.08.
If you use my li le shortcut: Otherwise:
Either way, P(p̂ ≤ 0.15) = 0.0061. The Talladega County jury panels are multiple samples with n = 100 in each, so the “proportion of all” interpretation
makes sense: In the long run, you expect 0.61% of jury panels to have 15% or fewer African Americans, if they’re randomly selected.
But actually 100% of those jury panels had 15% or fewer African Americans. How unlikely is that? Well, we don’t know how many juries there
were in the county in those 16 years, but surely it must have been at least one a year, or a total of 16 or more. The probability that 16 independent jury
pools would all have 15% or fewer African Americans, just by chance, is 0.006074559116 ≈ 3E-36, effectively zip. And if there was more than one jury a
year, as there probably was, the probability would be even lower. Something is definitely fishy.
BTW: The binomial probability is 0.0061 also. This is still pre y far out in the left-hand tail (z = −2.51), but the normal approximation is excellent. The message here is that
the normal approximation is pre y darn close except where the probabilities are so small that exactness isn’t needed anyway.
Here’s a side-by-side summary of sampling distributions of the mean (numeric data) and sampling distributions of the proportion (binomial data).
Always check requirements for the type of data you actually have!
Numeric Data Binomial Data
Each individual in sample provides a Each individual in sample provides a success or failure,
number. and you count successes.
Statistic of one sample mean x̅ = ∑x/n proportion p̂ = x/n
Parameter of population mean µ proportion p
Sampling distribution of the ... Sampling distribution of the mean Sampling distribution of the proportion (sampling
(sampling distribution of x̅) distribution of p̂)
Mean of sampling distribution µx̅ = µ µp̂ = p

Standard deviation of sampling SEM = standard error of the mean SEP = standard error of the proportion
distribution σx̅ = σ/√n σp̂ = √pq/n
Sampling distribution is close Random sample Random sample
enough to normal if ... 10n ≤ N 10n ≤ N
Population is ND or n ≥ about 30 np ≥ about 10 and nq ≥ about 10
NOTE: n is number of individuals per sample. Number of samples is indefinitely large and has no symbol.
Key ideas:
The sampling distribution of the mean (sampling distribution of x̅) and the sampling distribution of the proportion (sampling
distribution of p̂) are concepts, not something you ever construct in reality.
x̅ or p̂ is a random variable, varying with each sample.
The size of each sample is n. The distribution contains an indefinitely large number of samples of size n. (There’s no symbol for the
number of samples in the distribution.)
Treat all those sample means x̅ or sample proportions p̂ as a new data set and mentally draw a histogram. This is a picture of the
sampling distribution.
The Central Limit Theorem: the closer to normal the original population, and the larger the sample, the closer the sampling distribution will
be to a ND. For numeric data, n ≥ 30 is almost always good enough. For binomial data, it’s more complicated.
Describing a sampling distribution means giving its center, spread, and shape.
For any data type, the sampling distribution describes random samples that aren’t too large, not more than 10% of population.
For numeric data, the sampling distribution of x̅ (sampling distribution of the mean) has these properties:
The center of the sampling distribution (mu sub x-bar, µx̅) always equals the mean of the population (µ).
The standard deviation of the sampling distribution (sigma sub x-bar, σx̅, also known as the standard error of the mean or SEM)
always equals the standard deviation of the population divided by the square root of the sample size, σ/√n.
If n ≥ about 30, or population is ND, then the shape of the sampling distribution is close enough to normal. If requirements are not met,
you generally can’t say anything useful about the shape, and you can’t use the normal model.
For binomial data, the sampling distribution of p̂ (sampling distribution of the proportion) has these properties:
The center of the sampling distribution (mu sub p-hat, µp̂) always equals the proportion in the population (p).
The standard deviation of the sampling distribution (sigma sub p-hat, σp̂, also known as the standard error of the proportion or SEP)
always equals √pq/n.
If there are at least 10 successes and 10 failures expected per sample — if np ≥ 10 and nq ≥ 10 — then the shape of the sampling
distribution is close enough to normal. If requirements are not met, you generally can’t say anything useful about the shape, and you
can’t use the normal model.
Given µ and σ of a numeric population, or p of a binomial population, find the probability of a specified sample. To do this,
1. Check requirements. If they’re not met, stop.
2. Compute the standard error and make a sketch to scale.
3. Use normalcdf to compute probability. In normalcdf, the fourth argument is the unrounded standard error, not the population
standard deviation.

please donate at
do it after all.
Household incomes in the country Freedonia are a skewed distribution with mean $48,000 and standard deviation (SD) $2,000. You take a
1 random sample of size 64 and compute the mean of the sample. That x̅ is one sample mean out of the distribution of all possible sample means.
Describe the sampling distribution of the mean, including all symbols and formulas.
A manufacturer of light bulbs claims a mean life of 800 hours with SD 50 hours. You take a random sample of 100 bulbs and find a sample mean
2 of 780 hours.
(a) If the manufacturer’s claim is true, is a sample mean of 780 hours surprising? (Hint: Think about whether you need the probability of x̅ ≤ 780
or x̅ ≥ 780.)
(b) Would you accept the manufacturer’s claim?
Suppose 72% of Americans believe in angels, and you take a simple random sample of 500 Americans.
3 (a) Describe the sampling distribution of the proportion who believe in angels in samples of 500 Americans.
(b) Use the normal approximation to compute the probability of finding that 350 to 370 in a sample of 500 believe in angels. Reminder: You can’t
use the sample counts directly; you have to convert them to sample proportions.
In a town with 100,000 households, the last census showed a mean income of $32,400 with SD $19,000. The city manager believes that average
4 income has fallen since the census. Students at the local community college randomly survey 1000 households and find a sample mean income of
$31,000. What’s the chance of ge ing a sample mean ≤$31,000 if the true mean and SD are still what the census found?
Roule e is a popular casino game. The croupier spins the wheel in one direction and spins a white ball in the other direction along the rim, and
5 the ball drops into one of the slots. In the US, roule e wheels have 38 slots: 18 red, 18 black, and 2 green. (In Monte Carlo, the wheels have 37
slots because there’s only one green.)
(a) One way beginners play is to bet on red or black. If the ball comes up that color, they double their money; if it comes up any other color, they
lose their money. Construct a probability model for the outcome of a $10 bet on red from the player’s point of view.
(b) Find the mean and SD for the outcome of $10 bets on red, and write a sentence interpreting the mean.
(c) Now take the casino’s point of view. A large casino can have hundreds of thousands of bets placed in a day. Obviously they won’t all be
same, but it doesn’t take many days to see a whole lot of any given bet. Describe the sampling distribution of the mean for a sample of 10,000 $10 bets
on red.
(d) How much does the casino expect to earn on 10,000 $10 bets on red?
(e) What’s the chance that the casino will lose money on those 10,000 $10 bets on red?
(f) What’s the casino’s chance of making at least $2000 on those 10,000 $10 bets?
A sugar company packages sugar in 5-pound bags. The amount of sugar per bag varies according to a normal distribution. A random sample of
6 15 bags is selected from the day’s production. If the total weight of the sample is more than 75.6 pounds, the machine is packing too much per
bag and must be adjusted.
What is the probability of this happening, if the day’s mean is 5.00 pounds and SD 0.05 pounds?
The weights of cabbages in a shipment are normally distributed, with a mean of 38.0 ounces and SD of 5.1 ounces.
7 (a) If you randomly pick one cabbage, what is the probability that its weight is more than 43.0 ounces?
(b) If you randomly pick 14 cabbages, what is the probability that their average weight is more than 43.0 ounces?
Suppose the average household consumes 12.5 KW of electric power at peak time, with SD 3.5 KW. A particular substation in a typical
8 neighborhood serves 1000 households and has a capacity of 12,778 KW at peak time. (That’s 12 thousand and some, not 12 point something.)
Find the probability that the substation will fail to supply enough power.
In the Physicians’ Health Study, about 22,000 male doctors were randomly assigned to take
9 aspirin daily or a placebo daily. (Of course the study was double blind.) In the placebo group,
Heart
A ack
No
A ack
Total p̂
1.71% of doctors had heart a acks over the course of the study. Let’s take 1.71% as the rate of heart
a acks in an adult male population that doesn’t take aspirin. Placebo 189 10845 11034 1.71%
The heart a ack rate among aspirin takers was 0.94%, which looks like an impressive difference.
Is there any chance that aspirin makes no difference, and this was just the result of random selection? Aspirin 104 10933 11037 0.94%
In other words, how likely is it for that second sample to have p̂ = 0.94% if the true proportion of heart
a acks in adult male aspirin takers is actually 1.71%, no different from adult males who don’t take aspirin?
Men’s heights are normally distributed with mean 69.3″ and SD 2.92″. If a random sample of 16 men is taken, what values of the sample mean
10 would be surprising? In other words, what values of x̅ are in the 5% of the sampling distribution furthest away from the population mean?
(Hint: The 5% is the tails, the part of the sampling distribution that is not in the middle 95%.)
In June 2013, the Pew Research Center found that 45% of Americans had an unfavorable view of the Tea Party. In the second week of October
11 2013, according to Tea Party’s Image Turns More Negative (Pew Research Center 2013e [see “Sources Used” at end of book]), 737 adults in a
random sample of 1504 had an unfavorable view of the Tea Party.
In a population where 45% have an unfavorable view of the Tea Party, how likely is a sample of 1504 where 737 or more have an unfavorable
view? Can you draw any conclusions from that probability?
9. Estimating Population Parameters

Updated 24 Dec 2017
Summary: In Chapter 8, you learned what sort of samples to expect from a known population. In the rest of the course, you’ll learn how to use a
sample to make statements about an unknown population. This is inferential statistics.
In inferential statistics, there are two types of things you want to do: test whether some claim is true, and estimate the size of some
effect. In this chapter you’ll construct confidence intervals that estimate population means and proportions; Chapter 10 starts you on
testing claims.
Contents: 9A. Estimating Population Proportion p

9A1. Confidence Interval for p (Binomial Data)
· Computing a Confidence Interval
· Interpreting a Confidence Interval
· Easy CIs with TI-83/84
9A2. How Big a Sample for Binomial Data?
9B. Estimating Population Mean µ When You Know σ
9B1. Confidence Interval
9B2. How Big a Sample Do You Need?
9C. Estimating Population Mean µ When You Don’t Know σ
9C1. Student’s t Distribution
9C2. Confidence Interval for µ (Numeric Data)
9C3. The Trouble with Outliers
9C4. How Big a Sample for Numeric Data?
9A. Estimating Population Proportion p
In the Physicians’ Health Study [see “Sources Used” at end of book], about 22,000 male doctors were randomly assigned to take aspirin or a placebo
every night. Of 11,037 in the treatment group, 104 had heart a acks and 10,933 did not. Can you say how likely it is for people in general (or, at least,
male doctors in general) to have a heart a ack if they take aspirin nightly?
As always, probability of one equals proportion of all. [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c05_BasicsInterp] So you could just as
well ask, what proportion of people who take aspirin would be expected to have heart a acks?
Before statistics class, you would divide 104/11037 = 0.0094 and say that 0.94% of people taking nightly aspirin would be expected to have heart
a acks. This is known as a point estimate.
But you are in statistics class. You know that a sample can’t perfectly represent the population [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c01_ErrorsSampling], and therefore all you can say is that the true proportion of heart a acks in the
population of aspirin takers is around 0.94%. Can you be more specific?
Yes, you can. You can compute a confidence interval for the proportion of heart a acks to be expected among aspirin takers, based on your
sample, and that’s the subject of this chapter. We’ll get back to the doctors and their aspirin later, but first, let’s do an example with M&Ms.
9A1. Confidence Interval for p (Binomial Data)
Example 1: You take a random sample of 605 plain M&Ms, and 87 of them are red. What can you say about the proportion of reds in all plain M&Ms?
De inition: A point estimate of a population parameter is the single best available number, and in fact it’s nothing more than the corresponding
sample statistic.
In this example, your point estimate for population proportion is sample proportion, 87/605 = 14.4%, and you conclude
“Somewhere around 14.4% of all plain M&Ms are red.”
The sample proportion is a point estimate of the proportion in the population, the sample mean is a point estimate of the mean of
the population, the sample standard deviation is a point estimate of the standard deviation of the population, and so on.
De inition: A confidence interval estimate of a population parameter is a statement of bounds on that parameter and includes your level of
confidence that the parameter actually falls within those bounds.
For instance, you could say “I’m 95% confident that 11.6% to 17.2% of plain M&Ms are red.” 95% is your confidence level
(symbol: 1−α, “one minus alpha”). and 11.6% and 17.2% are the boundaries of your estimate or the endpoints of the interval.
As an alternative to endpoint form, you could write a confidence interval as a point estimate and a margin of error, like this: “I’m
95% confident that the proportion of red in plain M&Ms is 14.4% ± 2.8%.” 14.4% is your point estimate, and 2.8% is your margin of
error (symbol: E), also known as the maximum error of the estimate. Since the confidence interval extends one margin of error below
the point estimate and one margin of error above the point estimate, the margin of error is half the width of the confidence interval.
BTW: For all the cases you’ll study in this course, the point estimate — the mean or proportion of your sample — is at the middle of the confidence interval. But
that’s not true for some other cases, such as estimating the standard deviation of a population [URL: h ps://BrownMath.com/stat/stdev1.htm]. For those cases, computing the
margin of error is uglier.
Computing a Confidence Interval
As you might expect, your TI-83/84 and lots of statistical packages can compute confidence intervals for you. But before doing it the easy way, let’s
take a minute to understand what’s behind computing a confidence interval.
You can compute an interval to any level of confidence you desire, but 95% is most common by far, so let’s start there. How do you use those 87
reds in a sample of 605 M&Ms to estimate the proportion of reds in the population, and have 95% confidence in your answer?
In Chapter 8, you learned how to find the sampling distribution of p̂ [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c08_SampDistPropHello].
Given the true proportion p in the population, you could then determine how likely it is to get a sample proportion p̂ within various intervals. To find
a confidence interval, you simply run that backward.
You don’t know the proportion of reds in all plain M&Ms, so call it p. You know that, if the sample size is large enough [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c08_PropShapeSummary], sample proportions are ND and there’s a 95% chance that any given sample
proportion will be within 2 standard errors on either side of p, whatever p is.
The standard error of the proportion is σp̂ = √pq/n. You don’t know p — that’s what you’re trying to find. Are you stuck? No, you have an estimate for
p. Your point estimate for the population proportion p is the sample proportion p̂ = 87/605. You can estimate the standard error of the proportion (the
SEP) by using the statistics of your sample:
σp̂ ≈ √(87/605)(1−87/605)/605 = 0.0142656 or about 1.4%
Two standard errors is 0.0285312 → 0.029 or 2.9%.
BTW: How good is this estimate? For decent-sized samples, it’s quite good. For example, suppose the true population proportion p is 50% or 0.5. For a sample of n = 625, the
SEP is √.5(1−.5)/625 = 0.0200 or 2.00%. Your sample proportion is very, very, very unlikely to be as far away as 40% or 0.4, but even if it is then you would estimate the SEP
as √.4(1−.4)/625 = 0.0196 or 1.96%, which is extremely close.
BTW: Different authors use the term “standard error” slightly differently. Some use it only for the standard deviation of the sampling distribution, which you never know
exactly because you never know the population parameters exactly. Others use it only for the estimate based on sample statistics, which I computed just above. Still others use
it for either computation. In practice it doesn’t make a lot of difference. I don’t see much point to ge ing too fussy about the terminology, given that only one of them can be
computed anyway.
Any given sample proportion is 95% likely to be within two standard errors or 2.9% of the population proportion:
p−0.029 ≤ p̂ ≤ p+0.029 (probability = 95%)
Now the magic reverso: Given a sample proportion, you’re 95% confident that the population proportion is within 2.9% of that sample proportion:
p̂−0.029 ≤ p ≤ p̂+0.029 (95% confidence)
In this case, your sample proportion is 87/605 ≈ 0.144:
0.144−0.029 ≤ p ≤ 0.144+0.029 (95% confidence)
0.115 ≤ p ≤ 0.173 (95% confidence)
So your 95% confidence interval is 0.115 to 0.173, or 11.5% to 17.3%.
BTW: If the magic reverso seems like cheating, it’s not. Suppose you’re 95% sure that Cortland is within 12 miles of Dryden; aren’t you equally sure that Dryden is within 12
miles of Cortland? But you can also prove it with algebra. Here was our starting point:
p−0.029 ≤ p̂ ≤ p+0.029
Multiply by −1. When you multiply by a negative, you have to reverse the inequality signs.
−p+0.029 ≥ −p̂ ≥ −p−0.029
Rewrite in conventional order, from smallest to largest.
−p−0.029 ≤ −p̂ ≤ −p+0.029
Now add p+p̂ to all three “sides”.
p̂−0.029 ≤ p ≤ p̂+0.029
You might have noticed that I changed from 95% probability to 95% confidence. What’s up with that? Well, the sample proportion is a random
variable [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c07_ContVar] — different samples will have different sample proportions p̂, and you can
compute the probability of ge ing p̂ in any particular range.
But the population proportion p is not a random variable. It has one definite value, even though you don’t know what that definite value is.
Probability statements about a definite number make about as much sense as discussing the probability of precipitation for yesterday. The population
proportion is what it is, and you have some level of confidence that your estimated range includes that true value.
What does “95% confident” mean, then? Simply this: In the long run, when you do everything right, 95% of your 95% intervals will actually
include the population proportion, and the other 5% won’t. 5% is 5/100 = 1/20, so in the long run about one in 20 of your 95% confidence intervals
will be wrong, just because of sample variability.
Probability of one = proportion of all, so there’s one chance in twenty that this interval is wrong, meaning that it doesn’t contain the true
population proportion, even if you did everything right. If that makes you too nervous, you can use a higher confidence level, but you can never
reach 100% confidence.
There’s one more wrinkle. That margin of error of 0.029 was 2σp̂, two standard errors. The figure of 2 standard errors for the middle 95% of a ND
comes from the Empirical Rule [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c03_Empirical] or 68–95–99.7 Rule, so it’s only approximately
right.
But you can be a li le more precise. In Chapter 7 you learned to find the middle any percent [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c07_NormStandardApply], and that lets you generalize to any confidence level:
This Example General Case

Confidence level 95% 1−α
(middle area of the ND)
Area in the two tails combined 100%−95% = 5% or 0.05 1−(1−α) = α
Area in each tail 0.05/2 = 0.025 α/2
The boundaries are ±z0.025 = ±zα/2
invNorm(1−0.025) = 1.9600
The margin of error is E = 1.96σp̂ E = zα/2 σp̂
And you compute it as E = 1.96 E = zα/2
The margin of error on a 1−α confidence interval is zα/2 standard errors. (This will be important when you determine necessary sample size, below.)
The margin of error on a 95% confidence interval is close to 2σp̂, but more accurately it’s 1.96σp̂. For the proportion of red M&Ms, where the SEP
was σp̂ = 0.0142656, the margin of error is 1.96σp̂ = 0.0279606 → 0.028 or 2.8%. Since the point estimate was 14.4%, you’re 95% confident that the
proportion of reds in plain M&Ms is within 14.4%±2.8%, or 11.6% to 17.2%.
Interpreting a Confidence Interval
You’ve seen that there are two ways to state a confidence interval: from ____ to ____ with ____% confidence, or ____ ± ____ with _____% confidence.
Mathematically these are equivalent, but psychologically they’re very different. The first form is be er than the second.
What’s wrong with the ____ ± ____ form? It’s easy to misinterpret.
If you say “I’m 95% confident that the proportion of reds in plain M&Ms is within 14.4%±2.8%”, some people will read 14.4% and stop — they’ll
think that the population proportion is 14.4%. And even people who get past that will probably think that there’s something special about 14.4%, that
somehow it’s more likely to be the true proportion of reds among all plain M&Ms. But 14.4% is just a value of a random variable, namely the
proportion of reds in this sample. Another sample would almost certainly have a different p̂ and therefore a different midpoint for the interval.
It’s much be er to use the endpoint form, because the endpoint form is harder to misinterpret. When you say “I’m 95% confident that the
proportion of reds in plain M&Ms is 11.6% to 17.2%”, you lead the reader, even the non-technical reader, to understand that the proportion could be
anything in that range, and even that there’s a slight chance that it’s outside that range.
Requirements check (RC): This is an essential step — do it before you compute the confidence interval. Computing the CI assumes that the sampling
distribution of p̂ is a ND, but “assumes” in statistics means you don’t assume, you check it.
The requirements [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c08_PropShapeSummary] are stated in Chapter 8 as simple random sample
(or equivalent), np and nq both ≥ about 10, 10n ≤ N. You don’t know p, but for binomial data it’s okay to use p̂ as an estimate. But np̂ is just the number
of yeses or successes in your sample, and nq̂ is just the number of noes or failures in your sample, so you really don’t need to do any multiplications.
Here’s how you check the requirements:

Random sample: stated at start of section, OK.
Successes in sample: 87; failures in sample: 605−87 = 518; both ≥ 10, OK.
10n = 10×605 = 6050. We don’t know how many plain M&Ms there are in the world, but surely M&M Mars makes far more than that every
second, so this is also OK.
Easy CIs with TI-83/84
Your TI-83 or TI-84 can easily compute confidence intervals for a population proportion. With binomial data, this is Case 2 in Inferential Statistics:
Basic Cases. (Excel can do it too, but it’s significantly harder in Excel.)
Example 2: Let’s do the red M&Ms, since you already know the answer. See the requirements check above. Press [STAT] [◄] to get to the STAT TESTS
menu, and scroll up or down to find 1-PropZInt. (Caution: you don’t want 1-PropZTest. That’s reserved for Chapter 10.) Enter the number of
successes in the sample, the sample size, and the confidence level — easy-peasy! Write down the screen name and your inputs, then proceed to the
output screen and write down just the new stuff:
Here’s how you show your work:

1-PropZInt 87, 605, .95 (not PropZInt, please!)
(.11584, .17176), p̂ = .1438016529
There’s no need to write n=605 because you already wrote it down from the input screen.
Interpretation: I’m 95% confident that 11.6% to 17.2% of plain M&Ms are red.
You can vary that in several ways. For instance, some people like to put the confidence level last: 11.6% to 17.2% of plain M&Ms are red (95%
confidence). Or they may choose more formal language: We’re 95% confident that the true proportion of reds in plain M&Ms is 11.6% to 17.2%.
I’ve already pooh-poohed the margin-of-error form, but sometimes you have to write it that way, for instance if your boss or your thesis advisor
demands it. You can get it easily from the TI-83/84 output screen.
The center of the interval, the point estimate, is given: 14.38%. To find the margin of error, subtract that from the upper bound of the interval, or
subtract the lower bound from it: .17176−.1438 = .02796, or .1438−.11584 = .02796. Either way it’s 2.8%. You can then express the CI as 14.4%±2.8% with
95% confidence.
Example 3: What about the male doctors who started this section? 104 out of 11037 of the doctors taking nightly aspirin had heart a acks. Assuming
that male doctors are representative of adults in general, in terms of heart-a ack risk, what can you say about the chance of heart a ack for anyone
who takes aspirin nightly? Use confidence level 1−α = 95%.
Solution: Requirements check (RC):

Random sample stated, OK.
104 successes in sample, 11037−104 = 10933 failures, OK.
10n = 10×11037 = 110370. Without knowing the number of adults in the US or the world, we know it’s a lot more than that; OK.
1-PropZInt 104, 11037, .95
(.00762, .011223), p̂ = .0094228504
Conclusion: People who take nightly aspirin have a 0.76% to 1.12% chance of heart a ack (95% confidence).
BTW: An interesting special case occurs when you have no successes. Although you can’t do the regular calculation, because 0 successes doesn’t meet the requirement, you can
use an approximate procedure called the Rule of Three. In Confidence Intervals with Zero Events, Steve Simon (2010) [see “Sources Used” at end of book] explains, “zero
to 3/n is an approximate 95% confidence interval for a data set where we observed 0 events in n patients.”
Example: Suppose that your sample was only 50 doctors, and none of them had heart a acks. 3/50 = 6%, so you would be 95% confident that people who
take nightly aspirin have a zero to 6% chance of heart a ack.
9A2. How Big a Sample for Binomial Data?
The equation for margin of error is packed with information: E = zα/2
You can see that a larger sample size n means a narrower confidence interval, but the sample size is inside the square-root sign so you don’t get as
much benefit as you might hope for. If you take a sample four times as big, the square root of 4 is 2 and so your interval is half as wide, not ¼ as
wide.
You can see also that you get a narrower interval if you’re willing to live with a lower confidence level. The lower your confidence interval, the
smaller zα/2 will be, and therefore the narrower your confidence interval.
The bo om line is that there’s a three-way tension among sample size, confidence level, and margin of error. You can choose any two of those,
but then you have to live with the third. (p̂ doesn’t come into it. Although p̂ does contribute to the standard error and therefore to the margin of error,
you can’t choose what p̂ you’re going to get in a sample.)
If you want to get a confidence interval at your preferred confidence level with (no more than) a specified margin of error, how big a sample do you
need? MATH200A Program part 5 will compute this for you, but let’s look at the formula first.
(See Ge ing the Program [URL: h ps://BrownMath.com/ti83/math200a.htm#Download] for instructions on ge ing the MATH200A program into
your calculator.)
The equation at the start of this section shows the margin of error you get for a given sample size and confidence level. You can solve for the
sample size n, like this:
E = zα/2 ⇒
In the formula, p̂ is your prior estimate if you have one. This can be the result of a past study, or a reasonable estimate if it has some logical basis. If
you don’t have a prior estimate, use 0.5.
BTW: .5 or 50% is the conservative choice. It gives the largest possible sample size for a given E and C-Level. Why is that? Because the formula contains a multiplication by
p̂(1−p̂), and that product takes on its largest value when p̂ = 0.5.
Using .5 as your prior estimate, you’re guaranteed that your sample won’t be too small, though it may be larger than necessary. Why not just use .5 all
the time? Because taking samples always costs time and usually costs money, so you don’t want a larger sample than necessary.
Example 4: In a sample of 605 plain M&Ms, 87 were red. The 95% confidence interval had a 2.8% margin of error. How big a sample would you need
to reduce the margin of error to 2%?

(recommended):
(recommended):
Press [PRGM], select MATH200A, and press Marshal your data: prior estimate p̂ = 87/605, desired margin of error E = 0.02, and confidence level
[ENTER] twice. Dismiss the title screen and 1−α = 0.95.
you’ll see a menu. Press [5] for sample size.
Then select your data type, binomial. You need zα/2. Get α/2 from the confidence level:
1−α = 0.95 ⇒ α = 0.05 ⇒ α/2 = 0.025
zα/2 is the z-score such that the area to the right is α/2. In this problem, α/2 = 0.025, so you’re
computing z0.025. You’ll use invNorm, but invNorm wants area to left and 0.025 is an area to right,
so you compute invNorm(1−.025).
Now, to avoid re-entering that z value, chain your calculations. The

formula says you need to divide by E, so simply press [/] and type .02,
the desired margin of error, then press [ENTER]. Notice how the
calculator displays Ans as soon as you press the [/] key, to confirm that
you’re continuing the previous calculation.
To square the fraction, press [x²] [ENTER].

The next screen
wants your Finally, multiply by p̂ and (1−p̂). You get 1182.4…, and therefore your
estimated p, your required sample size is 1183.
desired margin of
Caution! Your answer is 1183, not 1182. You don’t round the result of a sample-size calculation. If it
error, and your desired confidence level.
comes out to a whole number (unusual), that’s your answer. Otherwise, you round up to the next
Your prior estimate is 87/605, from your
whole number. Why? Smaller sample size makes larger margin of error. n = 1182.4… corresponds
earlier study. Your margin of error is 2% =
to E = 0.02 exactly. A sample of 1182 would be just slightly under 1182.4…, and your margin of
.02 (not .2 !), and your confidence level is
error would be just slightly over 0.02. But 0.02 was the maximum acceptable margin of error, so
95% = .95.
1182.4… is the minimum acceptable sample size. You can’t take a fraction of an M&M in your
The output sample, so you have to go up to the next whole number.
screen echoes
back your inputs,
in case you forgot
to write down
the input screen, and then tells you that the
sample size must be at least 1183 M&Ms.
Notice the inequalities: for a margin of
error of .02 (2%) or less, you need a sample
size 1183 or more.
(z Crit is critical z or zα/2, the number
of standard errors associated with your
chosen confidence level.)
There’s no requirements check in sample-size problems. These are planning how to take your sample; requirements apply to your sample once you
have it.
Example 5: You’re taking the first political poll of the season, and you’d like to know what fraction of adults favor your candidate. You decide you can
live with a 90% confidence level and a 3% margin of error. How many adults do you need in your random sample?
Solution: Since you have no prior estimate for p, make p̂ = .5.
MATH200A/sample size/binomial 1−α = .9, E = .03, p̂ = .5

p̂=.5, E=.03, C-Level=.9, n ≥ 752 1−α = .9 ⇒ α = 0.1 ⇒ α/2 = 0.05
z0.05 = invNorm(1−.05)
Divide by E, which is .03.
Square the result.
Multiply by p̂ times (1−p̂).
n =751.5… → 752
9B. Estimating Population Mean μ When You Know σ
Numeric data are pre y much the same deal as binomial data, though there are a couple of wrinkles:
The requirements are different. For numeric data you need a sample bigger than about 30. If your sample is smaller, then it must be ND
with no outliers. (You need a random sample for every procedure.)
The standard error is σx̅ = σ/√n, so the margin of error is E = zα/2·σ/√n.
The second one is a problem, because you almost never know the standard deviation of the population. Therefore, we won’t be working any
problems for this case. Instead, I’ll give you a li le more theory to lay the groundwork for the next section, which explains how we get around this
knowledge gap.
9B1. Confidence Interval
If you know the standard deviation of the population — and you hardly ever do — then your confidence interval is
x̅ − zα/2 · σ/√n ≤ µ ≤ x̅ + zα/2 · σ/√n
If you’re ever in this situation, you can compute a confidence interval on your TI-83/84 by choosing ZInterval in the STAT TESTS menu.
9B2. How Big a Sample Do You Need?
The margin of error is E = zα/2 · σ/√n, so the required sample size for a margin of error E with confidence level 1−α is
n = [ zα/2 · σ / E]²
(You can also use MATH200A part 5.)
9C. Estimating Population Mean μ When You Don’t Know σ
“Houston, we have a problem!” A confidence interval is founded on the sampling distribution of the mean or proportion. Everything in Chapter 8 on
the sampling distribution of the mean [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c08_SampDistMeanProperties] was based on knowing the
standard deviation of the population. But you almost never know the standard deviation of the population. How to resolve this?
The solution comes from William Gosset [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Gosset], who worked for Guinness in Dublin as a
brewer. (I swear I am not making this up.) In 1908 he published a paper called The Probable Error of a Mean [see “Sources Used” at end of book]. For
competitive reasons, the Guinness company wouldn’t let him use his own name, and he chose the pen-name “Student”. The t distribution that he
described in his paper has been known as Student’s t ever since.
BTW: While looking for Gosset’s original paper, I stumbled on Probable Error of a Mean, The (“Student”) (Moulton [see “Sources Used” at end of book]). It’s a fascinating
look at what Gosset did and didn’t accomplish, and how this classic paper was virtually ignored for years. Things didn’t start to happen till Gosset sent a copy of his tables to
R. A. Fisher [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Fisher] with the remark that Fisher was the only one who would ever use them! It was Fisher who really
got the whole world using Student’s t distribution.
9C1. Student’s t Distribution
Gosset knew that the standard error of the mean is σ/√n, but he didn’t know σ. He wondered what
would happen if he estimated the standard error as s/√n, and did some experiments to answer that Because this textbook helps you,
question. Since s varies from one sample to the next, this new t distribution spreads out more than the please donate at
ND. Its peak is shallower, and its tails are fa er. BrownMath.com/donate.
Actually, there’s no such thing as “the” t distribution. There’s a different t for each sample size.
The larger the sample, the closer that t distribution is to a normal distribution, but it’s never quite
normal.
For technical reasons, t distributions aren’t identified by sample size, but rather by degrees of freedom (symbol df or Greek ν, “nu”). df = n−1.
Here are two t distributions:
Solid: standard normal distribution Solid: standard normal distribution

Line: Student’s t for df = 4, n = 5 Line: Student’s t for df = 29, n = 30
What do you see? Student’s t for 4 degrees of freedom is quite a bit more spread out than the ND: 12.2% of sample means are more than two standard
errors from the mean, versus only 5% for the ND.
At this scale, Student’s t for 29 degrees of freedom looks identical to the ND, but it’s not quite the same. You can see that 6% of sample means are
more than two standard errors from the mean, versus 5% for the ND.
You don’t really need a list of properties of Student’s t, because your calculator is going to do the work for you. It’s enough to know this:
There’s a t distribution for each sample size n. They’re identified by degrees of freedom, ν or df = n−1.
t distributions show more variability than z (the standard normal curve). As df or n increases, the t distributions get closer and closer to
normal. Beyond about df = 30, the difference is slight.
High t numbers occur more often than high z numbers, but only for small samples is the difference very noticeable.
9C2. Confidence Interval for μ (Numeric Data)
The logic of confidence intervals for numeric data is the same whether you know the standard deviation of the population or not. Even the
requirements are the same. The only difference is between using a z and a t.
BTW: The confidence interval formula for numeric data with unknown σ looks a lot like the one for known σ. You just replace σ by s and z by t:
x̅ − tα/2 · s/√n ≤ µ ≤ x̅ + tα/2 · s/√n (1-α confidence)
(It’s understood that you have to use the right number of degrees of freedom, df = n−1, in finding critical t.)
Example 6: You’re auditing a bank. You take a random sample of 50 cash deposits and find a mean of $189.56 and standard deviation of $42.17.
(a) Estimate the mean of all cash deposits, with 95% confidence.
(b) The bank’s accounting department tells you that the average cash deposit is over $210.00. Is that believable?
Solution: You want to compute a confidence interval about the mean of all deposits. You have numeric data, and you don’t know the standard
deviation of the population, σ. This is Case 1 in Inferential Statistics: Basic Cases. In your sample, n = 50, x̅ = 189.56, and s = 42.17.
First, check the requirements (RC):

Random sample given, OK.
Sample size 50 is > 30, OK.
10n = 10×50 = 500, and surely the bank has more deposits than that.
Since the sample is large enough, there’s no need to verify normality or check for outliers.)
Now calculate the interval.

On your TI-83/84, in the STAT TESTS menu, select 8:TInterval. The difference between Data and Stats is whether you have all the data points,
or just summary statistics. In this case you have only the stats, so cursor onto Stats and press [ENTER]. (The lower part of the screen may change.)
Enter your sample statistics and your desired confidence level. Write down your inputs before you select Calculate:
TInterval 189.56, 42.17, 50, .95
Proceed to the output screen, and write down everything new. There isn’t much:
(177.58, 201.54)
Finally, write your interpretation. I’m 95% confident that the average of all cash deposits is between $177.58 and $201.54.
Caution! Don’t say anything like “95% of deposits are between $177.58 and $201.54.” Your confidence interval is an estimate of the true average of all
deposits, and it’s not about the individual deposits. With a standard deviation of $42 and change, you would predict that 95% of deposits are within
2×42.17 = $84.34 either side of the mean, which is a much wider interval.
Now turn to part (b). Management claims that the average of all cash deposits is > $210.00. Is that believable? Well, it’s not impossible, but it’s
unlikely. You’re 95% confident that the average of all deposits is between $177.58 and $201.54, which means you’re 95% confident that it’s not
< $177.58 or > $201.54. But they’re claiming $210, which is outside your confidence interval. Again, they’re unlikely to be correct — there’s less than a
5% likelihood (100%−95% = 5%).
Example 7: In a random sample from the 237 vehicles on a used-car lot, the following weights in pounds were found:
2500 3250 4000 3500 2900 4500 3800 3000 5000 2200
Estimate the average weight of vehicles on the lot, with 90% confidence.
Solution: Check the requirements first. You have a small sample (n < 30), so you have to verify that the data are ND and there are no outliers. Here
are the results of normality check and box-whisker plot in MATH200A:
There’s not much you can write for the box-whisker plot, but you can show the normality test numerically:
Random sample, OK.
Box-whisker: no outliers, OK.
Normality: r(.9936) > crit(.9179). OK.
10% of population size is 23.7, and this sample of 10 is smaller than that.
Now proceed to your TInterval. This time you have the actual data, so you choose Data on the screen. Specify your data list. Freq (frequency)
should already be set to 1; if not, first press the [ALPHA] key once, and then [1] [ENTER]. Enter your confidence level, and write down your inputs:
TInterval L1, 1, .90
When you have raw data, everything on the output screen is new:
(2956, 3974)
x̅ = 3465, s = 878.1, n = 10
You’re 90% confident that the average weight of all vehicles on the lot is between 2956 and 3974 pounds.
Again, this is an estimate of the average weight of the population (the 237 cars on the lot). In your interpretation, you can’t say anything about the
weights of individual vehicles, because you don’t know anything about the weights of individual vehicles, apart from your sample.
9C3. The Trouble with Outliers
Why do you have to check for outliers? If your sample passes the normality check, isn’t that enough? No! If a sample passes the normality check, it
still might have outliers.
BTW: How can this be? How can a sample that contains outliers still pass the normality check? Well, back in Chapter 7 I said that if r > crit you can use the normal model and
if r < crit you can’t. But that simple rule hides a more complicated truth.
No sample is perfectly normal, so you’re not actually deciding “is it normal or not?” Instead, you’re finding the strength of evidence against
normality. The smaller r is, the stronger the evidence against a ND. If r < crit, the evidence is so strong that you say the data are non-normal. But if r > crit,
you can’t say that the data are definitely normal, only that you can’t rule out a ND based on this test. But outliers make the evidence against the normal
model too strong, so if outliers are present then you can’t treat the data as normal.
This “fail to prove” is similar to what you saw in Chapter 4 with decision points: you could prove that the correlation was non-zero, but you couldn’t
prove that it was zero. Starting in Chapter 10, you’ll see that this is how inferential statistics works whenever you’re testing some proposition.
Why are outliers a problem? Well, your confidence interval depends on the mean and standard deviation of your sample. But x̅ and s are sensitive to
outliers. (That sensitivity goes down as sample size goes up, so you don’t have to worry with samples bigger than about 30.)
To make this clearer, let’s look at an example. I drew these 15 points from a moderately skewed population:
157 171 182 189 201 208 217 219

229 242 247 252 265 279 375
The normality test shows r > crit. So far so good. But the box plot shows a big honkin’ outlier:
How big a difference does it make? Quite a lot, unfortunately. Here are the 95% confidence intervals for the original sample, and the sample with the
outlier removed. The means are different, the standard deviations are really different, and the high ends of the confidence intervals are pre y
different too. (The screens don’t show the margins of error, but they too are quite different: (258.45-199.28)/2 = 29.6 and (239.36-197.5)/2 = 20.9.)
95% CI from full sample 95% CI excluding outlier
Do you say that the outlier increased the mean by almost 5% and the SD by almost 50%, moved the confidence interval and made it wider? That’s not
really fair — the sample is what it is (assuming you’ve ruled out a mistake in data entry). If you start throwing out points, you no longer have a
random sample. On the other hand, that one point does seem to carry an awful lot of weight, and it doesn’t seem right to have results depend so
heavily on one point.
So what do you do? If you can, you take another sample, preferably a larger one. Larger samples are less likely to have outliers in the first place,
and outliers that do occur have less influence on the results.
But taking a new sample may not be practical. An alternative — not really great, but be er than nothing — is to do the analysis both ways, once
with the full sample and once with the outlier(s) excluded. That will at least give a sense of how much the outliers affect the results.
9C4. How Big a Sample for Numeric Data?
Example 8: For the vehicle weights, your margin of What if you don’t have the program? Since t is not super different from the normal
error in a 90% CI was 3974−3465 = 509 pounds. How distribution, you can alter the above formula and use z in place of t: n = [zα/2·s/E]². But the
many vehicles would you need in your sample to get t distribution is more spread out than the normal (z) distribution, so your answer may be
a 95% confidence interval with a margin of error of smaller than the actual necessary sample size. If you do that and you get > about 30, it’s
500 pounds? probably nearly right for the t distribution. If your answer is small, you should increase it
so that the TInterval doesn’t come out with too large a margin of error.
Solution: In MATH200A You calculate zα/2 exactly as you did in the sample-size formula for a confidence
part 5, select interval about a proportion. For example, with a 95% CI, 1−α = 1−0.95, α = 0.05, and α/2 =
2:Num unknown σ since 0.025. zα/2 = z0.025 = invNorm(1−.025) = 1.9600. so using z for t you compute sample size
you don’t know the [1.96·878.1/500]² = 11.8… → 12. That’s well under 30, so you want to bump it up a bit.
standard deviation of the
population. You’re first prompted for the estimated BTW: I’m deliberately glossing over this, because the program is a lot easier. But if you want more,
standard deviation s, which is based on your sample. check out Case 1 in How Big a Sample Do I Need? [URL:
Enter that, then the desired margin of error E and the h ps://BrownMath.com/stat/sampsiz.htm#Case1] That page gives you all the details of the method,
with worked-out examples.
desired confidence level.
When you enter the last piece of information, At first glance, this procedure is less precise than the successive approximations done by
you’ll notice that the calculator takes several seconds MATH200A. But in fairness, there’s one more source of un-preciseness that neither
to come up with an answer; this is normal because it method can avoid. Unlike binomial data, where small variations in the prior estimate p̂
has to do an iterative calculation (fancy words for made li le difference to the computed sample size, for numeric data variations in the
trial and error). standard deviation do make a difference in computed sample size. Since s is squared in the
Critical t for a 95% CI with 14 degrees of freedom formula, it can be a big difference. This can swamp any pe ifogging details about t versus
(n = 15) is 2.14, larger than critical z of 1.96 because z.
the t distribution is more spread out. But of course
what you really care about is the bo om line: to keep
margin of error no greater than 500 pounds in a 95%
CI, you need to sample at least 15 vehicles.
BTW: How is this computed? Start with the margin of

error and solve for sample size:
E = tα/2·s/√n ⇒ n = [tα/2·s/E]²
The problem here is that tα/2 depends on df, which
depends on n, so you haven’t really isolated sample
size on the left side. The only way to solve this
equation precisely is by a process of trial and error,
and that’s what MATH200 does.
Key ideas: (The online book has live links.)

The point estimate for a population parameter is the sample statistic — sample mean estimates population mean, sample proportion
estimates population proportion, and so on. But x̅ and p̂ vary from one sample to the next, so your estimate for µ or p must be a
range.
A confidence interval has three numbers: either confidence level, lower bound, upper bound, or confidence level, point estimate,
margin of error.
For binomial data, use 1-PropZInt to compute a CI estimate of the population proportion p. See Inferential Statistics: Basic Cases
for the requirements.
Find necessary sample size to estimate a population proportion p to within a desired margin of error. Use MATH200A Program
part 5 or the formula.
For numeric data, use TInterval to compute a CI estimate of the population mean µ. See Inferential Statistics: Basic Cases for
requirements.
To reduce margin of error by a given factor, sample size must increase by the square of that factor.
Always write an interpretation of your CI. With binomial data, your words should make it clear that you’re talking about the
proportion in the population, not just your sample. With numeric data, make it clear that you’re talking about an average, not
individual data points, and that it’s an average of a whole population, not just your sample.
Study aids: Inferential Statistics: Basic Cases

Interactive: Triage: Which Inferential Stats Case Should I Use? Because this textbook helps you,
Statistics Symbol Sheet please donate at
do it after all.
For a confidence interval, people sometimes say, “There’s a 95% chance that the mean height of all trees in the forest is …” Why is that not
1 correct — why must you say “I’m 95% confident” and not “there’s a 95% chance”?
Simple Simon took a random sample of 40 TC3 students and computed a 90% confidence interval of (45.20,60.14) for weekly food expense per
2 student. (He checked all requirements, and his computations were correct.) He wrote, “I’m 90% confident that TC3 students spend between
$45.20 and $60.14 a week for food.” There’s a huge mistake in that conclusion. Identify it, and write a correct conclusion.
Silly Sally took a random sample of 150 TC3 students and found that 50 of them “usually” or “always” prepare their own food instead of buying
3 from the cafeteria. She computed a 90% confidence interval of (.27002, .39664) and reported “With 90% confidence, on average 27% to 40% of TC3
students usually or always prepare their own food instead of buying from the cafeteria.” What is the biggest mistake in her conclusion?
The Neveready Company tested 40 randomly selected A-cell ba eries to see how long they would operate a wireless mouse. They found a mean
4 of 1756 minutes (29 hours, 16 minutes) and standard deviation (SD) 142 minutes. With 95% confidence, what’s the average life of all Neveready
A cells in wireless mice?
In World War II, a prisoner of war flipped a coin 10,000 times and recorded 5,067 heads.
5 (a) Find the point estimate for proportion of heads.
(b) What is the sample size? What is the population size?
You’re planning to conduct a poll about people’s a itudes toward a hot political issue, and you have absolutely no idea what proportion will be
6 in favor and what proportion will be opposed. If you want a margin of error no more than 3.5% at 95% confidence, how large must your sample
be?
The Department of Veterans Affairs is under fire for slow processing of veterans’ claims. An investigator for the Nightly Show randomly selected
7 100 claims (out of 68,917 at one office) and found that 40 of them had been open for more than a year. Find a 90% confidence interval for the
proportion of all claims that have been open for more than a year.
For her statistics project, Sandra kept track of her commute times for 40 consecutive mornings (8 weeks). Treat this as a random sample. Her
8 mean commute time was 17.7 minutes and her SD was 1.8 minutes. Find a 95% confidence interval for her average time on all commutes, not just
this sample of 40.
Fifteen women in their 20s were randomly selected for health screening. As part of this, their heights in inches were recorded:
9 62.5 63 67 63.5 62 63 65 64.5 66.5 64.5 62.5 62 61.5 64.5 67.5
Construct a 95% confidence interval for the average height of women aged 20–29.
For his statistics project, Fred measured the body temperature of 18 randomly selected healthy male students. Here are his figures in °F:
10 98.3 97.7 98.6 98.5 97.5 98.6 98.2 96.9 97.9
96.9 97.8 99.3 98.6 99.2 96.9 97.8 97.9 98.3
(a) Write a 90% confidence interval for the average body temperature of healthy male students.
(b) What does this say about the famous “normal” temperature of 98.6°?
(c) What is his margin of error?
(d) To get an answer to within 0.1° with 95% confidence, how many students would he have to sample?
The Colorectal Cancer Screening Guidelines (CDC 2014 [see “Sources Used” at end of book]) recommend a colonoscopy every ten years for
11 adults aged 50 to 75. A public-health researcher interviews a simple random sample of 500 adults aged 50–75 in Metropolis (pop. 6.4 million)
and finds that 219 of them have had a colonoscopy in the past ten years.
(a) What proportion of all Metropolis adults in that age range have had a colonoscopy in the past ten years, at the 90% level of confidence?
(b) Still at the 90% confidence level, what sample size would be required to get an estimate within a margin of error of 2%, if she uses her sample
proportion as a prior estimate?
The next year, you go back to audit the bank again. This time, you take a random sample of 20 cash deposits. Here are the amounts:
12 192.68 188.24 152.37 211.73 201.57 167.79 177.19 191.15 209.22 178.49 185.90 226.31 192.38 190.23 156.13
224.07 191.78 203.45 186.40 160.83
Construct a 95% confidence interval for the average of all cash deposits at the bank.
Not wanting to wait for the official results, Abe Snake commissioned an exit poll of voters. In a systematic sample of 1000 voters, 520 (52%)
13 said they voted for Abe Snake. (14,000 people voted in the election.) That sounds good, but can he be confident of victory, at the 95% level?
10. Hypothesis Tests

Updated 4 Nov 2020
Summary: You want to know if something is going on (if there’s some effect). You assume nothing is going on (null hypothesis), and you take a
sample. You find the probability of ge ing your sample if nothing is going on (p-value). If that’s too unlikely, you conclude that
something is going on (reject the null hypothesis). If it’s not that unlikely, you can’t reach a conclusion (fail to reject the null).
Contents: 10A. Testing a Proportion (Binomial Data)

10A1. Example 1: Swain v. Alabama
· Step 1: Hypotheses
· Step 2: Significance Level
· Step RC: Requirements Check
· Steps 3/4: Test Statistic and p-Value
· Step 5: Decision Rule
· Step 6: Conclusion (in English)
10A2. Example 2: Cancer Screening
10A3. Example 3: Small Samples
10B. Sharp Points
10B1. Type I and Type II Errors
10B2. One-Tailed or Two-Tailed?
· Pick the Right Hypotheses
· p < α in Two-Tailed Test: What Does it Tell You?
10B3. What Does the p-Value Mean?
10B4. Practical and Statistical Significance
10B5. Conclusions: Write ’em Right!
· When p < α, you reject H0 and accept H1.
· When p > α, you fail to reject H0.
10C1. Example 12: Bank Deposits
10C2. Example 13: Smokers and Retirement
10D. Confidence Interval and Hypothesis Test
Problem Set 1
Problem Set 2
10A. Testing a Proportion (Binomial Data)
Remember the Swain v. Alabama [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c08_Swain] example? In a county that was 26% African
American, Mr. Swain’s jury pool of 100 men had only eight African Americans. In that example, you assumed that selection was not racially biased,
and on that basis you computed the probability of ge ing such a low proportion. You found that it was very unlikely. This disconnect between the
data and the claim led you to reject the claim.
You didn’t know it, but you were doing a hypothesis test. This is the standard way to test a claim in statistics: assume nothing is going on,
compute the probability of ge ing your sample, and then draw a conclusion based on that probability. In this chapter, you’ll learn some formal
methods for doing that.
BTW: The basic procedure of a hypothesis test or significance test is due to Jerzy Neyman (1894–1981), a Polish American, and Egon Pearson (1895–1980), an Englishman.
They published the relevant paper in 1933.
We’re going to take a seven-step approach to hypothesis tests. The first examples will be for binomial data, testing a claim about a population
proportion. Later in this chapter you’ll use a similar approach with numeric data to test a claim about a population mean. In later chapters you’ll
learn to test other kinds of claims, but all of them will just be variations on this theme.
10A1. Example 1: Swain v. Alabama
Step 1: Hypotheses
Your first task is to turn the claim into algebra. The claim may be that nothing is going on, or that something is going on. You always have two
statements, called the null and alternative hypotheses.
Definition: The null hypothesis, symbol H0, is the statement that nothing is going on, that there is no effect, “nothin’ to see here. Move along,
folks!” It is an equation, saying that p, the proportion in the population (which you don’t know), equals some number.
Definition: The alternative hypothesis, symbol H1, is the statement that something is going on, that there is an effect. It is an inequality, saying
that p is different from the number mentioned in H0. (H1 could specify <, >, or just ≠.)
The hypotheses are statements about the population, not about your sample. You never use sample data in your hypotheses. (In real life you can’t
make that mistake, since you write your hypotheses before you gather data. But in the textbook and the classroom, you always have sample data up
front, so don’t make a rookie mistake.)
You must have the algebra (symbols) in your hypotheses, but it can also be helpful to have some English explaining the ultimate meaning of each
hypothesis, or the consequences if each hypothesis is true. Here you want to know whether there’s racial bias in jury selection in the county.
You don’t want to know if the proportion of African Americans in Mr. Swain’s jury pool is less than 26%: obviously it is. You want to know if it’s
too different — if the difference is too great to be believable as the result of random chance.
Write your hypotheses this way:
(1) H0: p = 0.26, there’s no racial bias in jury selection

H1: p < 0.26, there is racial bias in jury selection
Obviously those can’t both be true. How will you choose between them? You’ll compute the probability of ge ing your sample (or a more unexpected
one), assuming that the null hypothesis H0 is true, and one of two things will happen. Maybe the probability will be low. In that case you rule out
the possibility that random chance is all that’s happening in jury selection, and you conclude that the alternative hypothesis H1 is true. Or maybe the
probability won’t be too low, and you’ll conclude that this sample isn’t unusual (unexpected, surprising) for the claimed population.
The number in your null hypothesis H0, with binomial data, is called po because it’s the proportion as given in H0. (You may want to refer to the
Statistics Symbol Sheetto help you keep the symbols straight.)
BTW: What exactly is p? Yes, it’s the population proportion being tested, but what’s the population? It can’t be people in the county, or men in the county, or African-
American men in the county.
In fact it’s all people serving on Talladega County jury pools past, present and future. If there’s racial bias, then African Americans are less likely to be
selected than whites, and — probability of one, proportion of all — therefore the overall population of jury pools has less than 26% African Americans. If
there’s no racial bias, then in the long run the overall population of jury pools has the same 26% of African Americans as the county.
BTW: Although a hypothesis test is officially about the population, in cases like this one it’s okay to think of it as answering a simpler question: Is the difference between the
claim of no racial bias and the reality of this sample significant, or could it be explained away as the result of random chance? The hypotheses are the same either way, the
calculations are the same, and the conclusions are the same.
This is why a hypothesis test is also called a significance test or a test of significance.
Step 2: Significance Level
Okay, you’re looking to figure out if this sample is inconsistent with the null hypothesis. In other words, is it too unlikely, if the null hypothesis H0 is
true? But what do you mean by “too unlikely”? Back in Chapter 5, we talked about unusual events, with a threshold of 5% or 0.05 for such events.
We’ll use that idea in hypothesis testing and call it a significance level.
Definition: The significance level, symbol α (the Greek le er alpha), is the chance of being wrong that you can live with. By convention, you write
it as a decimal, not a percentage.
(2) α = 0.05
A significance level of 0.05 is standard in business and science. If you can’t tolerate a 5% chance of being wrong — if the consequences are
particularly serious — use a lower significance level, 0.01 or 0.001 for example. (0.001 is common if there’s a possibility of death or serious disease or
injury.) If the consequences of being wrong are especially minor, you might use a higher significance level, such as 0.10, but this is rare in practice.
In a classroom se ing, you’re usually given a significance level α to use.
BTW: Later in this chapter, you’ll see that the significance level α is actually concerned with a particular way of being wrong, a Type I error.
Step RC: Requirements Check
Back in Chapter 8, you learned the CLT’s requirements for binomial data [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c08_SampDistPropShape]: random sample not larger than 10% of population, and at least 10 successes
and 10 failures expected if the null hypothesis is true. You compute expected successes as npo by using po, which is the number from H0. Expected
failures are then sample size minus expected successes, n−npo in symbols. Steps 3 and 4 need the sampling distribution of the proportion to be a ND,
so you must check the requirements as part of your hypothesis test.
(RC) Random sample? Yes, according to the county. ✔

10n = 10×100 = 1000. We don’t know the number of adult males in the county, but it must be greater than 1000, surely. (“I
know that, and don’t call me Shirley.”) ✔
Expected successes = npo = 100×.26 = 26; expected failures are 100−26 = 74; both are ≥ 10. ✔
You might wonder about the first test. “The county may say it’s random, but I don’t believe it. Isn’t that why we’re running this test?” Good question!
Answer: Every hypothesis test assumes the null hypothesis is true and computes everything based on that. If you end up deciding that the sample was
too unlikely, in effect you’ll be saying “I assumed nothing was going on, but the sample makes that just too hard to believe.”
This same idea — the null hypothesis H0 is innocent till proven guilty — explains why you use 0.26 (po) to figure expected successes and
failures, not 0.08 (p̂). Again, the county claims that there’s no racial bias. If that’s true, if there’s no funny business going on, then in the long run 26%
of members of jury pools should be African American.
Comment: Usually, if requirements aren’t met you just have to give up. But for one-population binomial data, where the other two requirements are
met but expected successes or failures are much under 10, you can use MATH200A part 3 to compute the p-value directly. There’s an example in
“Small Samples”, below.
Steps 3/4: Test Statistic and p-Value
This is the heart of a hypothesis test. You assume that the null hypothesis is true, and then use what you know about the sampling distribution
[URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c08_top] to ask: How likely is this sample, given that null hypothesis?
Definition: A test statistic is a standardized measure of the discrepancy between your null hypothesis H0 and your sample. It is the number of
standard errors that the sample lies above or below H0.
You can think of a test statistic as a measure of unbelievability, of disagreement between H0 and your sample. A sample hardly ever matches your
null hypothesis perfectly, but the closer the test statistic is to zero the be er the agreement, and the further the test statistic is from 0 the worse the
sample and the null hypothesis disagree with each other.
BTW: Because you showed that the sampling distribution is normal and the standard error of the proportion [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c08_SampDistPropSpread] is implicitly known, this is a z test. The test statistic is z = (p̂−po) / σp̂ where
, but as you’ll see your calculator computes everything for you.
The p-value is the probability of ge ing your sample, or a sample even further from H0, if H0 is true. The smaller the p-value, the
stronger the evidence against the null hypothesis.
Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top] tells you that binomial data in one population are
Case 2. This is a hypothesis test of population proportion, and you use 1-PropZTest on your calculator.
To get to that menu selection, press [STAT] [◄] [5]. Enter po from the null hypothesis H0, followed by the number of successes x, the sample size n,
and the alternative hypothesis H1. Write everything down before you select Calculate. When you get to the output screen, check that your
alternative hypothesis H1 is shown correctly at the top of the screen, and then write down everything that’s new.
Definition:
(3/4) 1-PropZTest .26, 8, 100, <po

outputs: z = −4.10, p-value = 0.000 020 , p̂ = 0.08
By convention, we round the test statistic to two decimal places and the p-value to four decimal places.
When the p-value is less than one in ten thousand, you need more than four decimal places. Some authors just write “p <.0001” when the p-value
is that small; they figure nobody cares about fine shades of very low probability. Feel free to use that alternative.
Caution! Watch for powers of 10 (E minus whatever) and never write something daft like “p-value = 2.0346”.
What do these outputs of the 1-PropZTest tell you? The sample proportion, p̂ = 0.08, is more than 4 standard errors below the supposed population
proportion, po = 0.26. Your test statistic is z = −4.10. Since 95% of samples have z-scores within ±2, this is surprising. How surprising, that’s what the
p-value tells you.
How likely is it to get this sample, or one with even a smaller sample proportion, if the null hypothesis H0 is true? The p-value is 0.000 020, so if
there’s no racial bias in selection then there are only two chances in a hundred thousand of ge ing eight or fewer African Americans in a 100-man
jury pool. (There’s a lot more about interpreting the p-value later in this chapter.)
BTW: You don’t actually use the z-score, but I want you to understand something about what a test statistic is. Every case you study will have a different test statistic, and in
fact choosing a test statistic is the main difference between cases.
BTW: Why does one step have two numbers? In the olden days, when dinosaurs roamed the earth and a slide rule was the hot new thing, you had to compute the SEP and then
the z-score; that was step 3. Then you had to look up z in a printed table to find the p-value; that was step 4. The TI-83 or TI-84 gives you both at the same time, but I’ve kept
the numbering of steps.
Step 5: Decision Rule
There are two and only two possibilities, and all you have to do is pick the correct one based on your
p-value and your α: Because this textbook helps you,
p < α. Reject H0 and accept H1. please donate at
or BrownMath.com/donate.
p > α. Fail to reject H0.
Caution! There are lots of p’s in problems involving population proportions (Case 2), so make sure you select the right one. The p-value is the first p
on the 1-PropZInt output screen.
You can add the numbers, if you like — p < α (0.000 020 < 0.05) — but the symbols are required.
(5) p < α. Reject H0 and accept H1.
What are you saying here? The p-value was very small, so that means the chance of ge ing this sample, if there’s no racial bias, was very small.
Previously, you set a significance level of 0.05, meaning you would consider this sample too unlikely if its probability was under 5%. Its probability is
under 5%, so the sample and the null hypothesis contradict each other. The sample is what it is, so you can’t reject the sample. Therefore you reject
H0 and accept H1 — you declare that there is racial bias.
Another way to look at it: Any sample will vary from the population because random selection is always operating to produce sampling error
[URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c01_ErrorsSampling]. But the difference between this sample and the supposed population
proportion is just too great to be produced by random selection alone. Something else must be going on also. That something else is the alternative
hypothesis H1.
Definition: When the p-value is below α, the sample is too unlikely to come from ordinary sample variability alone, and you have a significant
result, or your result is statistically significant.
You always select a significance level before you know the p-value. If you could first get the p-value and then specify a significance level, you could
get whichever result you wanted, and there would be no point to doing a hypothesis test at all. Choosing α up front keeps you honest.
Step 6: Conclusion (in English)
Since you accepted H1 in the previous step, that’s your conclusion. If you have already wri en it in English as part of the hypotheses, as I did, then
most of your work is already done. You do need to add the significance level or the p-value, so your conclusion will look something like one of these:
(6) The 8% proportion of African American men in Mr. Swain’s jury pool is significantly below the expected 26%, and this is
evidence at the 0.05 level of significance of racial bias in the selection.
or
(6) The 8% proportion of African American men in Mr. Swain’s jury pool is significantly below the expected 26%, and this is
evidence of racial bias in the selection (p = 0.000 020).
If you’re publishing your hypothesis test, you’ll want to write a thorough conclusion that still makes sense if it’s read on its own. But in class
exercises you don’t have to write so much. It’s enough to write “At the 0.05 significance level, there is racial bias in jury selection” or “There is racial
bias in jury selection (p = 0.000 020)”.
10A2. Example 2: Cancer Screening
The Colorectal Cancer Screening Guidelines (CDC 2014 [see “Sources Used” at end of book]) recommend a colonoscopy every ten years for adults aged
50 to 75. A public-health researcher believes that only a minority are following this recommendation. She interviews a simple random sample of 500
adults aged 50–75 in Metropolis (pop. 6.4 million) and finds that 235 of them have had a colonoscopy in the past ten years. At the 0.05 level of
significance, is her belief correct?
Solution: The population is adults aged 50–75 in Metropolis. You want to know whether a minority of them — under 50% — follow the colonoscopy
guideline. Each person either does or does not, so you have binomial data, a test of proportion (Case 2 in Inferential Statistics: Basic Cases [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]). Try to write out the hypothesis test yourself before you look at mine below.
Reminder: Even though you already have the sample data in the problem, when you write the hypotheses, ignore the sample. In principle, you
write the hypotheses, then plan the study and gather data. If you use any of the sample data in the hypotheses, something is wrong.
You should have wri en something pre y close to this:
(1) H0: p = 0.5, half the seniors of Metropolis follow the guideline
H1: p < 0.5, less than half follow the guideline
(2) α = 0.05
(RC) Random sample? Yes.
10n ≤ N? Yes, 10n = 10×500 = 5000, surely less than the number of adults aged 50–75 in a population of 6,400,000.
At least 10 successes and 10 failures expected? Yes, npo = 500×.5 = 250, and n−npo = 500−250 = 250.
(3/4) 1-PropZTest: po=.5, x=235, n=500, p<po
outputs: z=−1.34, pval=0.0899, p̂=0.47
(5) p > α. Fail to reject H0.
(6) At the 0.05 level of significance, it’s impossible to say whether less than half of Metropolis seniors aged 50–75 follow the CDC
guideline for a colonoscopy every ten years or not.
[Or,
It’s impossible to say whether less than half of Metropolis seniors aged 50–75 follow the CDC guideline for a colonoscopy
every ten years or not (p = 0.0899).]
Important: When p is greater than α, you fail to reach a conclusion. In this situation, you must use neutral language. You mention both possibilities
without giving more weight to either one, and you use words like “impossible to say” or “can’t determine”.
This is unsatisfying, frankly. You go through all the trouble of gathering data and then you end up with a non-conclusion. Can anything be salvaged
from this mess?
Yes, you can do a confidence interval. This at least will let you set bounds on what percent of all seniors follow the guidelines. You’ve already
tested requirements as part of the hypothesis test, so go right into your calculations and conclusion. You’re free to pick any confidence level you wish,
but 95% is most usual.
1-PropZInt, 235, 500, .95
outputs: (.42625, .51375)
42.6% to 51.4% of Metropolis seniors aged 50–75 follow the CDC guideline on screening for colorectal cancer.
In a classroom se ing, or on regular homework, if you’re assigned a hypothesis test do that and don’t feel obligated to do a confidence interval also.
But in real life, and on labs and projects for class, you’ll usually want to do both.
10A3. Example 3: Small Samples
What if your sample is so small that expected successes npo or expected failures n−npo are under 10? You can no longer use 1-PropZTest, which
assumes that the sampling distribution of the proportion is ND, but you can compute the binomial probability directly as long as the other two
requirements are still met (SRS and 10n≤N). Only the calculation of the p-value changes.
Example: In 2001, 9.6% of Fictional County motorists said that fuel efficiency was the most important factor in their choice of a car. For her statistics
project, Amber set out to prove that the percentage has increased since then. She interviewed 80 motorists in a systematic sample of those registering
vehicles at the DMV, and 13 of them said that fuel efficiency was the most important factor in their choice of a car. Test her hypothesis, at the 0.05
significance level.
Please write out your hypothesis test before you look at mine.
(1) H0: p = 0.096, percentage has not increased

H1: p > 0.096, percentage has increased
(2) α = 0.05
(RC) SRS? Systematic sample can be analyzed like a random sample. ✔
10n≤N? 10×80 = 800, less than number of car owners in any county. ✔
Expected successes are npo = 80×.096 = 7.7, too far below 10 to live with. ✘
The sampling distribution of p̂ doesn’t follow the normal model, so you can’t use 1-PropZTest. But the other two requirements are met, so
you can proceed, calculating the binomial probability directly.
(3/4) MATH200A/Binomial prob: n=80, p=0.096, x=13 to 80; p-value = 0.0410
(If you don’t have the program, use 1−binomcdf(80,0.096,12) = 0.0410.)
[Why 13 to 80? H1 contains >, so you test the probability of ge ing the sample you got, or a larger one, if H0 is true. If H1 contained <, x
would be 0 to 13 — the sample you got, or a smaller one. See Surprised?in Chapter 6.]
(6) At the 0.05 significance level, the percentage of Fictional County motorists who rate fuel efficiency as most important has increased since
2001.
[Or, The percentage of Fictional County motorists who rate fuel efficiency as most important has increased since 2001 (p = 0.0410).]
10B. Sharp Points
Hypothesis tests are based on a simple idea, but there are lots of details to think about. This section clarifies some important ideas about the
philosophy and practice of a hypothesis test.
See also: Is Statistics Hard? (Dallal 2002 [see “Sources Used” at end of book]) offers great help in ge ing your head around these new
concepts.
HyperStat’s “Logic of Hypothesis Testing” (Lane 2013 [see “Sources Used” at end of book]) covers many of these same “sharp
points”. Because this stuff seems so weird at first, I suggest you look at his take on these same issues in addition to mine.
10B1. Type I and Type II Errors
Definition: A Type I error is rejecting the null hypothesis when it’s actually true.
Definition: A Type II error is failing to reject the null hypothesis when it’s actually false.
A Type I error usually causes you to do something you shouldn’t; a Type II error usually represents a missed opportunity.
Example 4: Suppose your alternative hypothesis H1 is that a new headache remedy PainX helps a greater proportion of people than aspirin.
A Type I error — rejecting H0 and accepting H1 when H0 is actually true — would have you announce that PainX helps more people when in
fact it doesn’t. People would then buy PainX instead of aspirin, and their headache would less likely be cured. This is a bad thing.
On the other hand, a Type II error — failing to reject H0 when it’s actually false — would mean you announce an inconclusive result. This keeps
PainX off the market when it actually would have helped more people than aspirin. This too is a bad thing.
Example 5: You’re on a jury, and you have to decide whether the accused actually commi ed the murder. What would be Type I and Type II errors?
To answer that you need to identify your null hypothesis H0. Remember that it’s always some form of “nothing going on here.” In this case, H0
would be that the defendant didn’t commit the murder, and H1 would be that he did.
A Type I error would be condemning an innocent man; a Type II error would be le ing a guilty man go free. In our legal system, a defendant is
not supposed to be found guilty if there is a reasonable doubt; this would correspond to your α. Probably α = 0.05 is not good enough in a serious
case like murder, where a Type I error would mean long jail time or execution, so if you’re on a jury you’d want to be more sure than that.
“Okay then,” you say, “I’ll have to be super careful and not make mistakes.” But remember from Chapter 1: In statistics, “errors” aren’t necessarily
mistakes. Errors are discrepancies between your results and reality, whatever their cause. Type I and Type II errors are not mistakes in procedure.
Even if you do everything right in your hypothesis test, you can’t be certain of your answer, because you can never get away from sample
variability.
How often will these errors occur? This is where your significance level comes into play. If you perform a lot of tests at α = 0.05, then in the long
run a Type I error will occur one time in twenty. It’s too big for these pages, but there’s a cartoon at xkcd.com [URL h p://xkcd.com/882/ accessed
2014-01-09] that illustrates this perfectly. The probability of a Type II error has the symbol β (Greek le er beta) and it has to do with the “power” of
the test, its ability to find an effect when there’s an effect to be found. β belongs to a more advanced course, and I don’t do anything with it in this
book.
Earlier, I said that your significance level α is the chance of being wrong that you can live with. Now I can be a li le more precise. α is not the chance
of any error; α is the chance of a Type I error that you can live with. If one Type I error in 20 hypothesis tests is unacceptable, use a lower
significance level — but then you make a Type II error more likely. If that’s unacceptable, increase your sample size.
Somebody is making a mint off the following chart. It’s in every stats textbook I’ve seen, so you may as well have it too:
Reject H0, accept H1 Fail to reject H0
If H0 is actually true Type I error Correct decision
If H0 is actually false Correct decision Type II error

(and H1 is true)
10B2. One-Tailed or Two-Tailed?
Summary: How do you know whether your H1 should contain “<” or “>” (a one-tailed test) or “≠” (a two-tailed test)? In class, the problem will
usually be clear about whether you’re testing for a “difference” (two-tailed) or testing if something is “be er”, “larger”, “less than”,
etc. (all one-tailed). But which one should you use when you’re on your own?
In general, prefer a two-tailed test unless you have a specific reason to make a one-tailed test.
When a two-tailed test reaches a statistically significant result, you interpret in a one-tailed manner.
Pick the Right Hypotheses
There are two main situations where a one-tailed test makes sense: “(a) where there is truly concern for the outcomes in one [direction] only and
(b) where it is completely inconceivable that the results could go in the opposite direction.”
—Dubey, quoted by Kuzma and Bohnenblust (2005, 132) [see “Sources Used” at end of book]
With a one-tailed test, say for µ<4.5, you’re saying that you consider “equal to 4.5” and “greater than 4.5” the same thing, that if µ isn’t less than 4.5
then you don’t care whether it’s equal or it’s greater. Sometimes you really don’t care, but very often you do. If the problem statement is ambiguous,
or if this is real life and you have to do a hypothesis test, how do you decide whether to do a one-tailed or two-tailed test?
Testing two-tailed doesn’t prejudge a situation. Do a two-tailed test unless you can honestly say, without looking at the data, that only one
direction of difference ma ers, or only one direction is possible.
Example 6: An existing drug cures people in an average of 4.5 days, and you’re testing a new drug. If you test for µ<4.5, you’re saying that it doesn’t
ma er whether the new drug takes the same time or takes more time. But that’s wrong: it ma ers very much. You want to test whether the new drug
is different (µ≠4.5). Then if it’s different, you can conclude whether it’s faster or slower.
Another way to look at this whole business: a one-tailed test essentially doubles your α — you’re much more likely to reach a conclusion with
dicey data. But that means double the risk of being wrong with a Type I error — not a good thing!
Sometimes the same situation can call for a different test, depending on your viewpoint.
Example 7: You’re the county inspector of weights and measures, checking up on a dairy and its half gallons of milk. Legally, half a gallon is 64 fluid
ounces. To a government inspector, “Dairylea gives 64.0 ounces in the average half gallon” and “Dairylea gives more than 64.0 ounces in the average
half gallon” are the same (legal), and you care only about whether Dairylea gives less (illegal). A one-tailed test (<) is correct.
But now shift your perspective. You’re Dairylea management. You don’t want to short the customers because that’s illegal, but you don’t want to
give too much because that’s giving away money. You make a two-tailed test (≠).
p < α in Two-Tailed Test: What Does it Tell You?
After a two-tailed test, if p<α then you can interpret the result as one-tailed.
Example 8: You want to test whether your candidate’s approval rating has changed from the previous dismal 40% after a major policy
announcement. Your H1 is p ≠ 0.4, and 170 out of a random sample of 500 voters approve (p̂ = 34%). Your p-value is 0.0062, so you reject H0 and
accept H1. You conclude that the candidate’s approval rating has changed.
But you can go further and say that her approval rating has dropped. You do this by combining the facts that (a) you’ve proved that approval
rating is different, which means it must be either less or more than 40%, and (b) the sample p̂ was less than po (40%).
You can phrase your conclusion something like this, first answering the original question then going beyond it: The candidate’s approval rating
has changed from 40% after the speech (p = 0.0062). In fact, it has dropped.
Your justification is the relationship between Confidence Interval and Hypothesis Test (later in this chapter), but you don’t actually have to compute
the CI. When p < α in a two-tailed test, po is outside the confidence interval (at the matching confidence level).
When p̂ is above po and the p-value is < α, the whole CI (if you computed it) would be above po so you know the true proportion is greater
than po.
Conversely, if p̂ is below po and the p-value is < α, the whole CI would be below po so you know the true proportion is below po.
10B3. What Does the p-Value Mean?
Summary: The p-value tells you how likely it is to get the sample you got (or a more extreme sample) if the null hypothesis is true.
Many people are confused about the p-value. They try to read too much into it, or they try to simplify it.
Part of the problem is trying to fit the meaning into the traditional structure of a one-sentence definition, so let’s try a story instead. In your
experiment, you got a certain result, a sample mean or sample proportion. Assume that the null hypothesis is true. If H0 is true, the properties of the
sampling distribution tell you how likely it is to get this sample result, or one even further away from H0. That likelihood is called the p-value.
BTW: The one-tailed p-value is exactly the probability that you computed with normalcdf in Chapter 8. When that’s less than 0.5, the two-tailed p-value is exactly double the
one-tailed p-value.
If the p-value is small, your results are in conflict with H0, so you reject the null and accept the alternative. If the p-value is larger, your sample is not
in conflict with H0 and you fail to reject the null, which is stats-talk for failing to reach any kind of conclusion.
In a nice phrase, Sterne and Smith [see “Sources Used” at end of book] say that p-values “measure the strength of the evidence against the null
hypothesis; the smaller the p-value, the stronger the evidence against the null hypothesis.” They also quote R. A. Fisher [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#bign_Fisher] on interpreting a p-value: “If P is between 0.1 and 0.9 there is certainly no reason to suspect
the hypothesis tested. If it is below 0.02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be
astray if we draw a conventional line at 0.05.”
The message here is that p-values fall on a continuum; you can’t just arbitrarily divide them into “significant” and “not significant” once and for
all.
The p-value is the likelihood, if H0 is actually true, that random chance could give you the results you got, or results even further from H0. It is a
conditional probability:
p-value = P(this sample given that H0 is true)
Yes, that seems convoluted — because it is. Alas, there just isn’t any description of a p-value that is both correct and simple.
The p-value is not the probability that either hypothesis is true or false:
The p-value is not the probability that H0 is true.
The p-value is not the probability that H0 is false.
The p-value is not the probability that H1 is true.
The p-value is not the probability that H1 is false.
The p-value is not the probability that your results are due to random chance.
The p-value is not the probability that your results are not due to random chance.
The p-value is not any of the above because they are all plain probabilities. Once again, the p-value is just a measure of how likely your results
would be if H0 is true and random chance is the only factor in selecting the sample.
The p-value tells you how unlikely this sample (or a more extreme one) is if the null hypothesis is true. The more unlikely (surprising, unexpected),
the lower the p-value, and the more confident you can feel about rejecting H0.
See also: Re: P Value for Kids (Moore 2003 [see “Sources Used” at end of book]), a very short article that cuts through all the bs.
Sifting the Evidence — What’s Wrong with Significance Tests? (Sterne and Smith 2001 [see “Sources Used” at end of book]). This is more
advanced, but still readable. They don’t argue against significance tests but do argue against the blind use of 0.05 as a significance
level in medical studies.
There’s one other thing: the p-value is not a measure of the size or importance of an effect. That gets into statistical significance versus practical
significance.
10B4. Practical and Statistical Significance
If your p-value is less than your significance level α, your result is statistically significant. That low p-value, Wheelan (2013, 11) [see “Sources Used”
at end of book] writes, means that your result is “not likely to be the product of chance alone”. That’s all that statistical significance means. But even
if a result is statistically significant, it may not be practically significant.
Example 9: Suppose that your p-value for “PainX is more likely to help a person than aspirin” is 0.000 002. You’re pre y darn sure that PainX is
be er. But to determine whether the result is practically significant, you have to ask not just whether PainX is be er, but by how much.
One way to evaluate practical significance is to compute a confidence interval about the effect size. In this case, the 95% confidence interval is
that a person is between 1.14 and 2.86 percentage points more likely to be helped by PainX than aspirin. Oh yes, and aspirin costs a buck for 100
tablets, where PainX costs $29.50 for ten. Most people would say this result has no practical significance. They’re not going to plunk down $30 for a
few pills that are only 2% more likely to help them than aspirin.
BTW: How can you get such a low p-value when the size of the effect is small? The answer is in extremely large sample sizes. In this made-up case, PainX helped 15,500 people
in a sample of 25,000, and aspirin helped 15,000 in a sample of 25,000. When you have really large samples, be especially alert to the issue of statistical versus practical
significance.
10B5. Conclusions: Write ’em Right!
Summary: As a statistician, you have an ethical obligation to make your results as easy as possible to understand, and as hard as possible to
misinterpret.
Avoid common errors when stating conclusions and interpreting them. Make sure you understand what you are doing, and
explain it to others in their own language.
When p < α, you reject H0 and accept H1.
If your p-value is less than your significance level, you have shown that your sample results were unlikely to arise by chance if H0 is true. The data
are statistically significant. You therefore reject H0 and accept H1.
Details: Assuming that H0 is true, the sample you got is surprising (unexpected, unusual). The data are inconsistent with the null hypothesis —
they can’t both be true. The data are what they are, and if the sample was properly taken you have to believe in it. Therefore, H0 is most likely false. If
H0 is false, its opposite H1 is true.
You accept H1, but you haven’t proved it to a certainty. There’s always that p-value chance that the sample results could have occurred when H0
is true. That’s why you say you “accept” H1, not that you have “proved” H1.
Compare to a jury verdict of “guilty”. It means the jury is convinced that the probability (p) that the defendant is innocent is less than a
reasonable doubt (significance level, α). It doesn’t mean there is no chance he’s innocent, just that there is very li le chance.
Suppose your null H0 is “the average package contains the stated net weight,” your alternative is “the average package contains less
than the stated net weight,” and your significance level is 0.05.
If p = 0.0241, which is < α, you reject H0 and accept H1. You conclude “the average package does contain less than the stated net
weight (p = 0.0241)” or “the average package does contain less than the stated net weight, at the 0.05 significance level.”
Don’t say the average package “might” be less than the stated weight or “appears to be” less than the stated weight. When you reject H0, state the
alternative as a fact within the stated significance level, or preferably with the p-value. (Again, compare to a jury verdict. The jury doesn’t say the
Example 10: defendant “might be guilty”.)
See also: Take published conclusions with a grain of salt. Even professional researchers can misuse
hypothesis tests. “Data mining” (first gathering data, then looking for relationships) is one
problem, but not the only one. See Why Most Published Research Findings Are False (Ioannidis
2005 [see “Sources Used” at end of book]). If you find the article heavy going, just scroll
down to read the example in Box 1 and then the corollaries that follow.
When p > α, you fail to reject H0.
If your p-value is greater than your significance level, you have shown that random chance could account
for your results if H0 is true. You don’t know that random chance is the explanation, just that it’s a possible
explanation. The data are not statistically significant.
You therefore fail to reject H0 (and don’t mention H1 in step 5). The sample you have could have come
about by random selection if H0 is true, but it could also have come about by random selection if H0 is false.
In other words, you don’t know whether H0 is actually true, or it’s false but the sample data just happened
to fall not too far from H0.
Compare to a jury verdict of “not guilty”. That could mean the defendant is actually innocent, or that
the defendant is actually guilty but the prosecutor didn’t make a strong enough case.
Example 11: Suppose your null hypothesis is “the average package contains the stated net weight,” your
alternative is “the average package contains less than the stated net weight,” and your significance level α is used by permission; source: h p://xkcd.com/892/
0.05. (accessed 2014-10-12)
If you compute a p-value of 0.0788, which is > α, you fail to reject H0 in step 5, but how do you state
your conclusion in step 6?
There are two kinds of answer, depending on who you talk to. Some people say “there’s insufficient evidence to prove that the average package is
underweight”; others say “we can’t tell whether the average package is underweight or not.” Of course there are many ways to write a conclusion in
English, but ultimately they boil down to “we can’t prove H1” (or the equivalent “we can’t disprove H0”) versus “we can’t reach a conclusion either
way.”
Does it ma er? Yes, I think it does.

“We can’t prove H1” is true, but it’s only part of the truth. The whole truth is that we can’t prove or disprove H0 when the p-value is greater
than the significance level.
Non-technical people are intimidated by statistics and may not realize what is meant. Tell them “we can’t disprove A” and they may think A
is true. And if you say “we can’t prove B”, people may think B is false. If you say straightforwardly, “we can’t determine which one is true”,
then there’s no risk of people jumping to a false conclusion.
As a practical ma er, we don’t do hypothesis testing just for the fun of it. We want to know something about the real world, not just out of
sheer curiosity but because the result will determine what we do. Starting to market a new drug costs mucho dinero, so “we can’t prove that
the new drug is be er” probably means the drug won’t go to market, and that would be too bad if the new drug actually is be er. “We can’t
determine whether the new drug is be er or not” essentially says “a new study is needed, probably with larger samples.”
Please understand: It’s not that the people writing the conclusions are confused (well, usually not). The problem is confusion among people reading
the conclusions.
Advice: It’s the same advice I’ve given before: Tailor your presentation to your audience. If you’re presenting to technical people, the
one-sided forms are okay, and you could answer Example 11 with something like “there’s insufficient evidence, at the 0.05 significance
level, to show that the average package is under weight” or “… to reject the hypothesis that the average package contains the stated net
weight.” (Since the p-value gives more information, you could give that instead of the significance level.)
But if your audience is non-technical people, don’t expect them to understand a two-sided truth from a one-sided conclusion. Instead, use
neutral language, such as “We can’t determine from the data whether the average package is underweight or not (p = 0.0788).” (You could state the
significance level instead of the p-value.)
What if the p-value is very large?
If your p-value is very large, say bigger than 0.5, there’s a good chance you’ve made a mistake. Check carefully whether you should be testing <, ≠, or
>. Also check whether you’re testing against the wrong number. For instance, suppose your H1 is that a coin comes up heads more than a third of the
time. A few dozen flips will probably yield a p-value very close to 1. This is the statistical equivalent of “Well, duh!”
Sometimes large p-values are correct, but those situations are rare enough that you should be suspicious.
Can we never accept the null hypothesis?
Not as a ma er of strict logic, no. But there are circumstances where the data do suggest that the null hypothesis is true. The most important of these
is when multiple experiments fail to reject H0. Here’s why.
Suppose you do an experiment at the 0.05 significance level, and your p-value is greater than that. Maybe H0 is really true; maybe it’s false but
this particular sample happened to be close to H0. You can’t tell — you’ve failed to disprove H0 but that doesn’t mean it’s necessarily true.
But suppose other experimenters also get p-values > 0.05. They can’t all be unlucky in their samples, can they?
If you keep giving the universe opportunities to send you data that contradict the null hypothesis, but you keep ge ing data that are consistent
with the null, then you begin to think that the null hypothesis shouldn’t be rejected, that it’s actually true.
This is why scientists always replicate experiments. If the first experiment fails to reject H0, they don’t know whether H0 is true or they were just
unlucky in their sample. But if several experiments fail to reject the null — always assuming the experiments are properly conducted — then they
begin to have confidence in the theory.
What if an experiment does reject H0? Is that it, game over? Not necessarily. Remember that even a true H0 will get rejected one time in 20 when
tested at the 0.05 level. Once again, the answer is replication. If they get more “reject H0”,scientists know that the first one wasn’t just a statistical
fluke. But if they get a string of “fail to reject H0”, then it’s likely that the first one was just that one in 20, and H0 is actually true.
Summary: Just as you used a TInterval in Chapter 9 to make a confidence interval about µ for numeric data, you use a T-Test to perform the
hypothesis test.
Typically you don’t know σ, the standard deviation (SD) of the population, and therefore you don’t know the standard error σ/√n either. So you
estimate the standard error as s/√n, using the known SD of the sample. That means that the test statistic is:
t = (x̅−µo) / (s/√n)
The t statistic is the estimated number of standard errors between your sample mean and the hypothetical population mean.
You met the t distribution when you computed confidence intervals in Chapter 9. Compared to z, the t distribution is a li le fla er and more
spread out, especially for small samples, so p-values tend to be larger.
Let’s jump in and do a t test. The numbered steps are almost the same as they were in the examples with binomial data — you just have the necessary
variations for working with numeric data. Because I’ll be adding some commentary, I’ve put boxes around what I expect to see from you for a
problem like this. (Refer to Seven Steps of Hypothesis Tests [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#ht7_top] if you don’t know the steps
very well yet.)
BTW: It hardly ever happens, but if you do know the SD of the population you can do a z test instead of a t test. Since the z distribution is a bit less spread out than the t
distribution, for very small samples the p-values are typically a bit lower with a z test than with a t. But the difference is rarely enough to change the result — and again, you
are quite unlikely to know the SD of the population, so a z test is quite unlikely to be the right one.
10C1. Example 12: Bank Deposits
The management claims that the average cash deposit is $200.00, and you’ve taken a random sample to test that:
192.68 188.24 152.37 211.73 201.57 167.79 177.19 191.15 209.22 178.49
185.90 226.31 192.38 190.23 156.13 224.07 191.78 203.45 186.40 160.83
At the 0.05 significance level, does this sample show that the average of all cash deposits is different from $200?
Solution: The data type is numeric, and the population SD σ is unknown, so this is a test of a population mean, Case 1 from Inferential Statistics:
Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]. Your hypotheses are:
(1) H0: µ = 200, management’s claim is correct

H1: µ ≠ 200, management’s claim is wrong
Comment: Even though you already have the sample data in the problem, when you write the hypotheses, ignore the sample. In principle, you
write the hypotheses, then plan the study and gather data. If you use any of the sample data in the hypotheses, something is wrong.
So you don’t use numbers from the sample in your hypotheses, and you don’t use the sample to help you decide whether the alternative
hypothesis H1 should have < ≠, or >.
The significance level was given in the problem. (Problems will usually give you an α to use.)
(2) α = 0.05
Next is the requirements check. Even though it doesn’t have a number, it’s always necessary. In this case, n = 20, which is less than 30, so you have to
test for normality and verify that there are no outliers.
Enter your data in any statistics list (I used L5), and check your data entry carefully. Use the MATH200A program “Normality chk” to check for
a normal distribution and “Box-whisker” to verify that there are no outliers.
You don’t need to draw the plots, but do write down r and crit and show the comparison, and do check for outliers. (For what to do if you have
outliers, see Chapter 3.)
(RC) Random sample: given.

10n = 10×20 = 200, and the bank had be er have more deposits than that or it can’t afford to pay you for your work!
Normality: yes. From MATH200A part 4, r(0.9864) > crit(0.9503).
Outliers: none (MATH200A part 2).
Now it’s time to compute the test statistic (t) and the p-value.
On the T-Test screen, you have to choose Data or Stats just as you did on the TInterval screen. You have the actual data, so you select Data
on the T-Test screen, instead of Stats. Then the sample mean, sample SD, and sample size are shown on the output screen, so you write them down
as part of your results. Always write down x̅, s, and n.
(3/4) T-Test: µo=200, List=L5, Freq=1, µ≠µo

results: t=−2.33, p=0.0311, x̅=189.40, s=20.37, n=20
The decision rule is the same for every single hypothesis test, regardless of data type. In this case:
And as usual, you can write your conclusion with the significance level or the p-value:
(6) At the 0.05 level of significance, management is incorrect and the average of all cash deposits is different from $200.00. In fact,
the true average is lower than $200.00.
Or,
(6) Management is incorrect, and the average of all cash deposits is different from $200.00 (p = 0.0311). In fact, the true average is
lower than $200.00.
Remember what happens when you do a two-tailed test (≠ in H1) and p turns out less than α: After you write your “different from” conclusion, you
can go on to interpret the direction of the difference. See p < α in Two-Tailed Test.
In a classroom exercise, if you were asked to do a hypothesis test you would do a hypothesis test and only a hypothesis test. But in real life, and in
the big labs for class, it makes sense to answer the obvious question: If the true mean is less than $200.00, what is it?
You don’t have to check requirements for the CI, because you already checked them for the HT.
TInterval L5, 1, .95

outputs: (179.86, 198.93)
With 95% confidence, the average of all cash deposits is between $179.86 and $198.93.
10C2. Example 13: Smokers and Retirement
Here’s an example where you have statistics without the raw data. It’s adapted from Sullivan (2011, 483) [see “Sources Used” at end of book].
According to the Centers for Disease Control, the mean number of cigare es smoked per day by individuals who are daily smokers
is 18.1. Do retired adults who are daily smokers smoke less than the general population of daily smokers?
To answer this question, Sascha obtains a random sample of 40 retired adults who are current daily smokers and record the
number of cigare es smoked on a randomly selected day. The data result in a sample mean of 16.8 cigare es and a SD of 4.7
cigare es.
Is there sufficient evidence at the α = 0.01 level of significance to conclude that retired adults who are daily smokers smoke less
than the general population of daily smokers?
Solution: Start with the hypotheses. You’re comparing the unknown mean µ for retired smokers to the fixed number 18.1, the known mean for
smokers in general. Since the data type is numeric (number of cigare es smoked), and there’s one population, and you don’t know the SD of the
population, this is Case 1, test of population mean, from Inferential Statistics: Basic Cases [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top].
(1) H0: µ = 18.1, retired smokers smoke the same amount as smokers in general
H1: µ < 18.1, retired smokers smoke less than smokers in general
Comment: The claim is a population mean of 18.1, so you use 18.1 in your hypotheses. Using the sample mean of 16.8 in Step 1 is a rookie
mistake, one of the Top 10 Mistakes of Hypothesis Tests [URL: h ps://BrownMath.com/stat/topten.htm]. Never use sample data in your
hypotheses.
Comment: Why does H1 have < instead of ≠? The short answer is: that’s what the problem says to do. In the real world, you would do a two-
tailed test (≠) unless there’s a specific reason to do a one-tailed test (< or >); see One-Tailed or Two-Tailed? (earlier in this document).
Presumably there’s some reason why they are interested only in the case “retired smokers smoke less” and not in the case “retired smokers
smoke more”.
(2) α = 0.01
(RC) Random sample (given).
n > 30.
10n = 10×40 = 400, less than the total number of retired smokers.
Therefore the sampling distribution is normal.
(3/4)
T-Test: µo=18.1, x̅=16.8, s=4.7, n=40, µ<µo

outputs: t=−1.75, p=0.0440
(6) At the 0.01 level of significance, we can’t determine whether the average number of cigare es smoked per day by retired adults who are
current smokers is less than the average for all daily smokers or not.
Or,
We can’t tell whether the average number of cigare es smoked per day by retired adults who are current smokers is less than the average for
all daily smokers or not (p = 0.0440).
When you fail to reject H0, you cannot reach any conclusion. You must use neutral language in your non-conclusions. Please review When p > α,
you fail to reject H0 earlier in this chapter.
10D. Confidence Interval and Hypothesis Test
Summary: You can use a con idence interval to conclude whether results are statistically signi icant. A hypothesis test (HT) and con idence interval (CI) are
two ways of looking at the same thing: what possibilities for the population mean or proportion are consistent with my sample?
A 95% CI is the flip side of a 0.05 two-tailed HT. More generally, a 1−α CI is the complement of an α two-tailed HT.
Example 14: The baseline rate for heart a acks in diabetes patients is 20.2% in seven years. You have a new diabetes drug, Effluvium, that is effective
in treating diabetes. Clinical trials on 89 patients found that 27 (30.3%) had heart a acks. The 95% confidence interval is 20.8% to 39.9% likelihood of
heart a ack within seven years for diabetes patients taking Effluvium. What does this tell you about the safety of Effluvium?
Solution: Okay, you’re 95% confident that Effluvium takers have a 20.8% to 39.9% chance of a heart a ack within seven years. If you’re 95% confident
that their chance of heart a ack is inside that interval, then there’s only a 5% or 0.05 probability that their chance of heart a ack is outside the
interval, namely <20.8% or >39.9%.
But 20.2% is outside the interval, so there’s less than a 0.05 chance that the true
probability of heart a ack with Effluvium is 20.2%.
CI and HT calculations both rely on the sampling distribution. The open curve centered on
20.2% shows the sampling distribution for a hypothetical population proportion of 20.2%.
Only a very small part of it extends beyond 30.3%, the proportion of heart a acks you actually
found in your sample.
The chance of ge ing your sample, given a hypothetical proportion po in the population,
is the p-value. If po = 20.2%, your sample with p̂ = 30.3% would be unlikely (p-value below
0.05). You would reject the null hypothesis and conclude that Effluvium takers have a
different likelihood of heart a ack from other diabetes patients, at the 0.05 significance level.
Further, the entire confidence interval is above the baseline value, so you know that Effluvium
increases the likelihood of heart a ack in diabetes patients.
At significance level 0.05, a two-tailed test against any value outside the 95% confidence interval (the shaded curve) would lead to rejecting the null
hypothesis. And you can say the same thing for any other significance level α and confidence level 1−α.
What if the interval does include the baseline or hypothetical value? Then you fail to reject the null hypothesis.
Example 15: A machine is supposed to be turning out something with a mean value of 100.00 and SD of 6.00, and you take a random sample of 36
objects produced by the machine. If your sample mean is 98.4 and SD is 5.9, your 95% confidence interval is 96.4 to 100.4.
Now, can you make any conclusion about whether the machine is working properly?
Solution: Well, you’re 95% confident that the machine’s true mean output is somewhere between 96.4 and
100.4. With this sample, you can rule out a true population mean of <96.4 or >100.4, at the 0.05 significance
level; but you can’t rule out a true population mean between 96.4 and 100.4 at α = 0.05. A hypothesis test
would fail to reject the hypothesis that µ = 100. You can’t determine whether the true mean output of the
machine is equal to 100 or not.
When µo or po is inside the 1−α CI, the two-tailed p-value is > α. Your sample does not contradict H0 and you fail to reject H0.
When µo or po is outside the 1−α CI, the two-tailed p-value is < α. Your sample contradicts H0, and you reject H0.
Leaving the symbols aside, when you test a null hypothesis your sample either is surprising (and you reject the null hypothesis) or is not surprising
(and you fail to reject the null). Any null hypothesis value inside the confidence interval is close enough to your sample that it would not get rejected,
and any null hypothesis value outside the interval is far enough from the sample that it would get rejected.
Special Note for Binomial Data
For numeric data, the CI and HT are exactly equivalent.
But for binomial data, the CI and HT are only approximately equivalent. Why? Because with binomial data, the HT uses a standard error derived
from po in the null hypothesis, but the CI uses a standard error derived from p̂, the sample proportion. Since the standard errors are slightly different,
right around the borderline they might get different answers. But when the hypothetical po is a fair distance outside the CI, as it was in the drug
example, the p-value will definitely be less than α.
What about One-Tailed Tests?
Good question!
A confidence interval is symmetric (for the cases you study in this course), so it’s intrinsically two-tailed. A one-tailed HT for < or > at α = 0.01
corresponds to a two-tailed HT for ≠ at α = 0.02, so the CI for a one-tailed HT at α = 0.01 is a 98% CI, not a 99% CI. The confidence level for a one-
tailed α is 1−2α, not 1−α.
Correspondence between Significance Level and Confidence Level
α tails C-Level
1 1−2×.05 = 90%
0.05
2 1−.05 = 95%
1 1−2×.01 = 98%
0.01
2 1−.01 = 99%
1 1−2×.001 = 99.8%
0.001
2 1−.001 = 99.9%
If the baseline value is outside the confidence interval, you can say (at the appropriate significance level) that the true value of µ or p is different from
the baseline, and then go on to say whether it’s bigger or smaller, so you get your one-tailed result.
On the other hand, if the baseline value is inside the confidence interval, you can’t say whether the true µ or p is equal to the baseline or different
from it, and if you can’t say whether they’re different then you can’t say which one is bigger than the other.
Though most hypothesis tests are to find out something about a population, sometimes you just want to know whether this sample is significantly
different from a population. In this case, you don’t need a random sample, but the other requirements must still be met.
Example 16: At Wossama a University, instructors teach the statistics course independently but all sections take the same final exam. (There are
several hundred students.) One semester, the mean score on the exam is 74. In one section of 30 students, the mean was 68.2 and the SD was 10.4. The
students felt that they had not been adequately prepared for the exam by the instructor. Can they make their case?
Solution: In effect, they are saying that their section performance was significantly below the performance of students in the course overall. This is a
testable hypothesis. But the hypothesis is not about the population that these 30 students were drawn from; we already know about that population.
Instead, it is a test whether this sample, as a sample, is different from the population.
(1) H0: This section’s mean was no different from the course mean.
H1: This section’s mean was significantly below the course mean.
(2) α = 0.05
(RC) (Omit the requirement for a random sample.)
10n = 10×30 = 300 is less than the “several hundred students” in the course.
Sample size is ≥30, so the sampling distribution is normal.
(3/4) TTest: µ = 74, x̅ = 68.2, s = 10.4, n = 30, µ < µo
Outputs: t = −3.05, p-value = 0.0024
(6) This section’s average exam score was less than the overall course average (p-value = 0.0024).
Okay, there was a real difference. This section’s mean exam score was not only below the average for the whole course, but too far below for random
chance to be enough of an explanation.
But did the students prove their case? Their case was not just that their average score was lower, but that the difference was the result of poor
teaching. Statistics can’t answer that question so easily. Maybe it was poor teaching; maybe these were weaker students; maybe it was environmental
factors like classroom temperature or the time of day; maybe it was all of the above.

You don’t know the proportion or mean of a population. You want to test whether it is different from some baseline number. You
take a sample, and then compute how likely that sample would be if the true proportion or mean in the population is equal to that
baseline. If the sample is too unlikely, you reject the null hypothesis and conclude that the true proportion or mean must be
different from that baseline number.
Know the seven steps of hypothesis tests. Know them by heart, and write them on your cheat sheet if you need to.
Know whether you have binomial or numeric data. This totally determines which type of test you will do, so think before you act!
When you have numeric data, you test for the mean of a population (hypotheses about µ). When you have binomial data in a count
of successes, you test for the proportion in a population (hypotheses about p).
Understand one-tailed versus two-tailed tests. When should you use which one? How do you interpret the results in step 6?
Understand the significance level α. Know how to pick an appropriate level.
Understand the p-value. It’s the probability, if H0 is true, of ge ing the sample you got (or one even further away from H0).
Know how to write conclusions (if p-value < α) or non-conclusions (if p-value > α).
Understand Type I and Type II errors. Describe what each one means in specific situations.
Understand the relationship between a confidence interval and a hypothesis test. How can you relate the endpoints of a CI to
whether you do or don’t have a statistically significant result, so that H0 would or wouldn’t be rejected?

Seven Steps of Hypothesis Tests please donate at
Top 10 Mistakes of Hypothesis Tests BrownMath.com/donate.
do it after all.
Problem Set 1
List the seven steps of every hypothesis test.

1
Why must you select a significance level before computing a p-value?
2
Explain the p-value in your own words.
3
You’ve tested the hypothesis that the new accelerant makes a difference to the time to dry paint, using α = 0.05. What is wrong with each
4 conclusion, based on the p-value? Write a correct conclusion for that p-value.
(a) p = 0.0214. You conclude, “The accelerant may make a difference, at the 0.05 significance level.”
(b) p = 0.0714. You conclude, “The accelerant makes no difference, at the 0.05 significance level.”
You are testing whether the new accelerant makes your paint dry faster. (You have already eliminated the possibility that it makes your paint dry
5 slower.)
(a) What conclusion would be a Type I error? What wrong action would a Type I error lead you to take?
(b) What conclusion would be a Type II error? What wrong action would a Type II error lead you to take?
Are Type I and Type II errors actually mistakes? What one thing can you do to prevent both of them, or at least make them both less likely?
6
What can you do to make a Type I error less likely at a given sample size? What’s the unfortunate side effect of that?
7
Explain in your own words the difference between “accept H0” (wrong) and “fail to reject H0” (correct) when your p-value is > α.
8
The engineering department claims that the average ba ery lifetime is 500 minutes. Write both hypotheses in symbols.
9
Suppose H is “the directors are honest” and H1 is “the directors are stealing from the company.” Write conclusions, in Statistics and in
10 English, if …0
(a) if p = 0.0405 and α = 0.01
(b) if p = 0.0045 and α = 0.01
In your hypothesis test, H0 is “the defendant is innocent” and H1 is “the defendant is guilty”. The crime carries the death penalty. Out of 0.05,
11 0.01, and 0.001, which is the most appropriate significance level, and why?
When Keith read the AAA’s statement that 10% of drivers on Friday and Saturday nights are impaired, he believed the proportion was
12 actually higher for TC3 students. He took a systematic sample of 120 students and, on an anonymous questionnaire, 18 of them admi ed
being alcohol impaired the last Friday or Saturday night that they drove. Can he prove his point, at the 0.05 significance level?
In 2006–2008 there was controversy about creating a sewer district in south Lansing, where residents have had their own septic tanks for
13 years. The Sewer Commi ee sent out an opinion poll to every household in the proposed sewer district. In a le er to the editor, published
3 Feb 2007 in the Ithaca Journal, John Schabowski wrote, in part:
The Jan. 4 Journal article about the sewer reported that “only” 380 of 1366 households receiving the survey responded, with 232
against it, 119 supporting it, and 29 neutral. ... The survey results are statistically valid and accurate for predicting that the sewer
project would be voted down by a large margin in an actual referendum.
Can you do a hypothesis test to show that more than half of Lansing households in the proposed district were against the sewer project? (You’re
trying to show a majority against, so combine “supporting” and “neutral” since those are not against.)
Esperanza wanted to determine whether more than 40% of grocery shoppers — specifically, the primary grocery shoppers in their
14 households — regularly use manufacturers’ coupons. She conducted a random telephone survey and contacted 500 people. (For this exercise,
let’s assume that telephone subscribers are representative of grocery shoppers.) Of the 500 she contacted, 325 do the grocery shopping in their
households. Of those 325, 182 said they regularly use manufacturers’ coupons.
(a) What is the size of the sample? (Think before you answer!)
(b) What is the population, and how large is it?
(c) What does the number 182 represent?
(d) Don’t do a hypothesis test. But if you did, what would po be?
(e) Is it a source of bias that she considered only each household’s primary grocery shopper?
Doubting Thomas remembered the Monty Hall example from Chapter 5, but he didn’t believe the conclusion that switching doors would
15 improve the chance of winning to 2/3. (It’s okay if you don’t remember the example. All the facts you need are right here.)
Thomas watched every Let’s Make a Deal for four weeks. (Though this isn’t a random sample, treat it as one. There’s no reason why the show
should operate differently in these four weeks from any others.) In that time, 30 contestants switched doors, and 18 of them won.
(a) At the 0.05 significance level, is it true or false that your chance of winning is 2/3 if you switch doors?
(b) At the 95% confidence level, estimate your chance of winning if you switch doors.
(c) If you don’t switch doors, your chance of winning is 1/3. Using your answer to (b), is switching doors definitely a good strategy, or is there some
doubt?
Most of us have spam filters on our email. The filter decides whether each incoming piece of mail is spam. Heather trusts her spam filter, and
16 she sets it to just delete spam rather than save it to a folder.
(a) What would Heather’s spam filter do if it makes a Type I error? What would it do if it makes a Type II error?
(b) Which is more serious here, a Type I error or a Type II error? Should the significance level α be set higher or lower?
Rosario read in Chapter 6 [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c06_GeometricDist] that 30.4% of US households own cats. She
17 felt like dogs were a lot more visible than cats in Ithaca, so she decided to test whether the true proportion of cat ownership in Ithaca was less
than the national proportion. She took a systematic sample of Wegmans shoppers one day, and during the same time period a friend took a
systematic sample of Tops shoppers. (They counted groups shopping together, not individual shoppers, so they didn’t have to worry about ge ing
the same household twice.)
Together, they accumulated a sample of 215 households, and of those 54 owned cats. Did she prove her case, at the 0.05 significance level?
Problem Set 2
What is wrong with each pair of hypotheses? Correct the error.

18 (a) H0 = 14.2; H1 > 14.2
(b) H0: µ < 25; H1: µ > 25
(c) You’re testing whether ba eries have a mean life of greater than 750 hours. You take a sample, and your sample mean is 762 hours. You write
H0:µ=762 hr; H1:µ>762 hr.
(d) Your conventional paint takes 4.3 hours to dry, on average. You’ve developed a drying accelerant and you want to test whether adding it makes a
difference to drying time. You write H0: µ=4.3 hr; H1: µ < 4.3 hr.
This year, water pollution readings at State Park Beach seem to be lower than last year. A sample of 10 readings was randomly selected from
19 this year’s daily readings:
3.5 3.9 2.8 3.1 3.1 3.4 3.2 2.5 3.5 3.1
Does this sample provide sufficient evidence, at the 0.01 level, to conclude that the mean of this year’s pollution readings is significantly lower than
last year’s mean of 3.8?
Dairylea Dairy sells quarts of milk, which by law must contain an average of at least 32 fl. oz. You obtain a random sample of ten quarts and
20 find an average of 31.8 fl. oz. per quart, with SD 0.60 fl. oz. Assuming that the amount delivered in quart containers is normally distributed,
does Dairylea have a legal problem? Choose an appropriate significance level and explain your choice.
You’re in the research department of StickyCo, and you’re developing a new glue. You want to compare your new glue against StickyCo’s best
21 seller, which has a bond strength of 870 lb/in².
You take 30 samples of your new glue, at random, and you find an average strength of 892.2 lb/in², with SD 56.0. At the 0.05 significance level, is
there a difference in your new glue’s strength?
New York Quick Facts from the Census Bureau (2014b) [see “Sources Used” at end of book] says that 32.8% of residents of New York State aged
22 25 or older had at least a bachelor’s degree in 2008–2012. Let’s assume the figure hasn’t changed today.
You conduct a random sample of 120 residents of Tompkins County aged 25+, and you find that 52 of them have at least a bachelor’s degree.
(a) Construct a 95% confidence interval for the proportion of Tompkins County residents aged 25+ with at least a bachelor’s degree.
(b) Don’t do a full hypothesis test, but use your answer for (a) to determine whether the proportion of bachelor’s degrees in Tompkins County is
different from the statewide proportion, at the 0.05 significance level.
You’re thinking of buying new Whizzo bungee cords, if the new ones are stronger than your current Stretchie ones. You test a random sample
23 of Whizzo and find these breaking strengths, in pounds:
679 599 678 715 728 678 699 624
At the 0.01 level of significance, is Whizzo stronger on average than Stretchie? (Stretchies have mean strength of 625 pounds.)
For her statistics project, Jennifer wanted to prove that TC3 students average more than six hours a week in volunteer work. She gathered a
24 systematic sample of 100 students and found a mean of 6.75 hours and SD of 3.30 hours. Can she make her case, at the 0.05 significance level?
As a POW in World War II, John Kerrich flipped a coin 10,000 times and got 5067 heads. At the 0.05 level of significance, was the coin fair?
25
People who take aspirin for headache get relief in an average of 20 minutes (let’s suppose). Your company is testing a new headache remedy,
26 PainX, and in a random sample of 45 headache sufferers you find a mean time to relief of 18 minutes with SD of 8 minutes.
(a) Construct a 95% confidence interval for the mean time to relief of PainX.
(b) Don’t do a full hypothesis test, but use your answer for (a) to determine at the 0.05 significance level whether PainX offers headache relief to the
average person in a different time than aspirin.
11. Inference from Two Samples

Updated 5 Nov 2020
Intro: In Chapter 10, you looked at hypothesis tests for one population, where you asked whether a population mean or proportion is
different from a baseline number. In this chapter, you’ll ask “are these two populations different from each other?” (hypothesis test)
and “how large is the difference?” (confidence interval).
Contents: 11A. Numeric Data — Paired or Unpaired?

Unpaired Data / Independent Samples
Paired Data / Dependent Samples
Paired and Unpaired Data Compared
Example 5: Seed Corn
When to Use Paired Data?
Example 6: Where the Rubber Meets the Road
Example 7: The Freshman Fifteen
Entering Paired Numeric Data
Hypothesis Test for Mean Difference
Confidence Interval for Mean Difference
Example 8: Coffee and Heart Rate
Example 9: A Tough Grader?
Hypothesis Test for Difference of Means
Confidence Interval for Difference of Means
Example 10: Sorority Academics
Example 11: Traffic Stops and Traffic Tickets
Hypothesis Test for Difference of Proportions
Confidence Interval for Difference of Proportions
Necessary Sample Size for Confidence Interval
Example 14: Gardasil Vaccine
11E. Confidence Interval and Hypothesis Test (Two Populations)
11F. More Confidence Intervals for Two Populations
Example 17: Heights of Men and Women
Example 18: Coffee and Heart Rate with Negatives
Example 19: Opinion Poll
Example 20: GPA of Fraternity Members and Nonmembers
11A. Numeric Data — Paired or Unpaired?
That’s the key question when you’re doing inference on numeric data from two samples. Your answer will control how you analyze the data, so let’s
look closely at the difference.
11A1. Unpaired Data / Independent Samples
Definitions: You have unpaired data when you get one number from each individual in two unrelated groups. The two groups are known as
independent samples.
Independent samples result when you take two samples completely independently, or if you take one sample and then randomly assign the
members to groups. Randomization always gives you independent samples.
Example 1: What if any is the average difference in time husbands and wives spend on yard work? You randomly select 40 married men and 40
married women and find how much time a week each spends in yard work. There’s no reason to associate Man A with Woman B any
more than Woman C; these are independent samples and the data are unpaired.
Example 2: How much “winter weight” does the average adult gain? You randomly select 500 adults and weigh them all during the first week of
November. Then during the last week of February you randomly select another 500 adults and weigh them. The data are unpaired,
and the samples are independent.
Before you read further, what’s the big problem in the design of those two studies?
Right! Our old enemy, confounding variables. Look at the examples again, and see how many you can identify. For example, what might make a
random person in one sample weigh more or less than a random person in the other sample, other than the passage of time? What might make a
random woman spend more or less time on yard work than a random man, apart from their genders?
With independent samples, if there’s actually a difference between the two groups, it may be swamped by all the differences within each group.
11A2. Paired Data / Dependent Samples
Definitions: You have paired data when each observational unit gives you two numbers. These can be one number each from a matched pair of
individuals, or two numbers from one individual. Paired data come from dependent samples.
Example 3: What if any is the average difference in time husbands and wives spend on yard work? You randomly select 40 couples and find how
much time a week each person spends in yard work. Each husband and wife are a matched pair. The samples are dependent because
once you’ve chosen a couple you’ve equally specified a member of the “wives” sample and a member of the “husbands” sample.
Example 4: How much “winter weight” does the average adult gain? You randomly select 500 adults and weigh them all during the first week of
November, then again during the last week of February. You have paired data in the before and after numbers. The two samples are
dependent because they are the same individuals.
Do you see how a design with paired data (dependent samples) overcomes the big problem with unpaired data (independent samples)? You want to
study weight gain, and now that’s what you’re measuring directly. You wanted to know whether husband or wife spends more time on yard work,
and now you’ve eliminated all the differences between couples.
Paired data are more likely than unpaired to reveal an effect, if there is one. Why? Because a paired-data design minimizes differences within
each group that can swamp any difference between groups.
In studying human development and behavior, twins are a prime source of dependent samples. If you have a pair of identical twins who were raised
apart (and that’s surprisingly common), you can investigate which differences between people’s behavior are genetic and which are learned. The
Minnesota Study of Twins (Bouchard 1990 [see “Sources Used” at end of book]), found that a lot of behaviors that “should” be learned seem to be
genetic. The New York Times published a nontechnical account in Major Personality Study Finds That Traits Are Mostly Inherited (Goleman 1986 [see
“Sources Used” at end of book]).
11A3. Paired and Unpaired Data Compared
Independent, or
Sample type Dependent
randomized
Numeric data type Paired Data Unpaired Data
How many numbers from each experimental unit? Two One
Can you rearrange★ one sample? No Yes
Independent, or
Sample type Dependent
randomized
Numeric data type Paired Data Unpaired Data
Problem of confounding variables Minimal Severe
Use this design … … if you can … if you must
★If the data from the sample are arranged in two rows or two columns, can you rearrange one row or column without destroying information?
11A4. Example 5: Seed Corn
You’re the head of research for the Whizzo Seed Company, and you’ve developed a new
type of seed that looks promising. You randomly select three farmers in Western New York
to receive new corn, and three to receive your standard product. (Of course you don’t tell
them which one they’re ge ing.) At the end of the season they report their yield figures to
you.
What’s wrong with this picture? You can easily think of all sorts of confounding
variables here: different soils, different weather, different insects, different irrigation,
different farming techniques, and on and on. Those differences can be great enough to hide
(confound) a difference between the two types of corn, especially in a small sample.
The following year, you try again in Central New York. This time you send each farmer two
stocks of seed corn, with instructions to plant one field with the first stock and another field
with the second. Testing new corn versus standard corn for yield.
Does that eliminate confounding variables? Maybe not totally, but it reduces them as Adapted from Dabes and Janik (1999, 263) [see “Sources
far as possible. Now, if you see significant differences in yield between two fields planted Used” at end of book]
by the same farmer, it’s almost sure to be due to differences in the seed.
When to Use Paired Data?
You always want to structure an experiment or observation with paired data (dependent samples) — if you can.
“If you can.” Aye, there’s the rub. Suppose you want to know whether a ending kindergarten makes kids do be er in first grade. There’s no way to
set this up as paired data: how can a given kid both go through kindergarten and not go through kindergarten? Twin studies don’t help you here,
because if the twins are raised together the parents will send both of them to kindergarten, or neither; and if the twins are raised apart then there will
be too many other differences in their upbringing that could affect their performance in first grade.
If the samples are independent, you can’t pair the data, even if the samples are the same size. If you’re not sure whether you have dependent or
independent samples, look back at 11A5. Paired and Unpaired Data Compared.
11A6. Example 6: Where the Rubber Meets the Road
You want to determine whether a new synthetic rubber makes tires last longer than the competitor’s product. Can you see how to do this with
independent samples (unpaired data) and with dependent samples (paired data)? Think about it before you read on.
For independent samples, you randomly assign drivers to receive four tires with your new rubber or four of the competitor’s tires. For dependent
samples, you put two tires of one type on the left side of every driver’s car, and two on the right side of every driver’s car. (You do half the cars one
way and half the other, to eliminate differences like the greater likelihood of hi ing the curb on the right.)
With the first method, if there’s only a small difference between your rubber and the competitor’s, it may not show up because you’ve also got
differences in driving styles, roads, and so forth — confounding variables again. With the second method, those are eliminated.
Summary: The hypothesis test is almost exactly like the Case 1 hypothesis test [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c10_h test_root]. The difference is that you define a new variable d (difference) in Step 1
and write hypotheses about µd instead of µ.
For a confidence interval, you’re estimating the average difference, not the average of either population. You need to state both size
and direction of the effect.
11B1. Example 7: The Freshman Fifteen
You’ve probably heard about the “freshman fifteen”, the weight gain many students experience in their first year at college. The Urban Dictionary
even talks about the “freshman twenty” (2004) [see “Sources Used” at end of book].
Francine wanted to know if that was a real thing or just an urban legend. During the first week of school, she got the other nine women in her
chemistry class at Wossama a U to agree to help her collect data. (She reasoned that students in any particular class would effectively be a random
sample of the school, since class choice is unrelated to weight or other health issues. Of course that would be questionable for a spin class or a
cooking class.)
Wossama a U CHEM101 — Women’s Weights (in pounds)
Student A B C D E F G H I J
Sept. 118 105 123 112 107 130 120 99 119 126
May 125 114 128 122 106 143 124 103 125 135
When she had the data, Francine realized she didn’t know what to do next. If she had just one set of numbers, she would do a Student’s t test [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c10_h test_root], since she doesn’t know the population standard deviation (SD). But what to do with
two lists?
Then she had a brainstorm. She realized that she’s not trying to find out anything about students’ weights. She wants to know about their weight
gain. Looking at their weights, she’d have plenty of lurking variables starting with pre-college diet and lifestyle. Looking only at the weight gain
minimizes or eliminates those variables, and measures just what happened to each student during freshman year. So she added a third row to her
chart:
Wossama a U CHEM101 — Women’s Weights (in pounds)
Student A B C D E F G H I J
Sept. 118 105 123 112 107 130 120 99 119 126
May 125 114 128 122 106 143 124 103 125 135
d = May−Sept. 7 9 5 10 −1 13 4 4 6 9
Notice the new variable d, the difference between matched pairs. (You know the data must be paired, because each May number is associated with
one and only one September number. You can’t rearrange the May numbers and still have everything make sense.) This is the heart of Case 3 in
Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]: reducing paired numeric data to a simple t test of a
population mean difference.
Here’s what’s new:

Define d so that you always subtract in the same direction.
Always include the definition of d in your analysis.
Your population parameter becomes µd, the mean difference.
In requirements check, test the d’s, not the original data.
Write your conclusion about the mean difference as a positive number with a direction. (Example: not “a mean difference of −14”, but “a
mean decrease of 14”.)
Now she’s all set. She has one set of ten numbers, representing the continuous variable “weight gain in freshman year” for a random sample of
Wossama a U women. (Notice with student E, Francine has a negative value for d because May minus September is 106−107 = −1. That student lost
weight as a freshman.) Time for a t test!
But first, what will she test? Her original idea was to test the “freshman fifteen”. But a glance at the d’s
shows her that no one gained as much as 15 lb. An average can’t be larger than every member of the Because this textbook helps you,
data set, so there’s no way she could prove a hypothesis that the average gain is above fifteen pounds. please donate at
She decides instead to try to prove a “freshman five”, µd > 5, with 0.05 significance. BrownMath.com/donate.
BTW: Subtle point here: You never use sample data in a hypothesis, but you can sometimes adjust your hypotheses
after you collect your data, especially when it’s obvious that your data won’t prove what you wanted to prove. Another reasonable choice for Francine would be to try to prove
simply that the average student gains weight, µd > 0.
When you do a confidence interval, you don’t have to make any decision of this kind because you just follow the data where they lead.
Entering Paired Numeric Data
Francine subtracted by hand here, but you shouldn’t do that because it’s a rich source of errors and makes it harder to check your work. Instead,
follow this procedure on your TI-83/84:
1. Enter the first data set (September, in this case) in L1.
2. Enter the second data set (May) in L2. Unlike the one-population cases, the order ma ers.
3. Check your data entry. Since you entered all of the September figures and then all the May figures, check them the opposite way, first
student A September and May, then student B, and so on.
4. Cursor to L3 — the column heading, not the first number.
5. Francine defined d as May−Sept., which is L2−L1, so enter that formula. (To subtract in the other direction, enter L1−L2.) As soon as you
press [ENTER], the calculator does all the subtractions, wiping out whatever was in L3 previously.
This isn’t Excel — if you change L1 or L2 after entering the formula for L3, L3 won’t change. You need to re-enter the formula for L3 in that
case. (You actually can make the calculator behave like Excel by binding a formula to a list, but it’s not worth the hassle.)
Hypothesis Test for Mean Difference
With paired numeric data, your population parameter is the mean difference µd. The random variable is a difference (in this case, a number of
pounds gained from September to May), so the parameter is the mean of all those weight gains.
(1) d = May−September
H0: µd = 5, average student gains 5 lb or less
H1: µd > 5, average student gains more than 5 lb
(2) α = 0.05
(RC) Random sample? Yes, effectively. (It’s a random sample of Wossama a U women frosh, not necessarily those from other colleges.)
10n ≤ N? Yes, because any university has more than 10×10 = 100 women in the freshman class.
n = 10 (< 30), so Francine must test for normality and verify absence of outliers. She tests L3, not L1 and L2, because L3 holds her sample
data of weight gain:
r=.9811 and crit=.9179. r>crit, and the box-whisker shows no outliers.

(3/4) This is a regular T-Test, number 2 in the STAT TESTS menu. Francine writes down
T-Test: 5, L3, 1, >µo
results: t=1.29, p = 0.1146 , d̅=6.6, s=3.9, n=10
BTW: The sample mean is d̅ (“d-bar”), not x̅, because the data are d’s, not x’s.
(6) You can’t determine whether the average Wossama a U woman student gains more than 5 pounds in her freshman year or not (p = 0.1146).
Or,
At the 0.05 significance level, you can’t determine whether the average Wossama a U woman student gains more than 5 pounds in her
freshman year or not.
After a “fail to reject H0”, you always remember to write your conclusion in neutral language, right? Maybe the true average weight gain is greater
than 5 pounds but this particular sample just happened not to show it; maybe the true average weight gain really is under 5 pounds, A confidence
interval can help you get a handle on the effect size.
Confidence Interval for Mean Difference
When a hypothesis test fails to reach a conclusion, a confidence interval can salvage at least some information. When a hypothesis test does reach a
conclusion, a confidence interval can give you more precise information.
If Francine was doing only the confidence interval, she’d have to start off by testing requirements. But she has already tested them as part of the
hypothesis test, so she goes right to the TINTERVAL screen.
Which confidence level does she choose? Her one-tailed hypothesis test at α = 0.05 would be equivalent to a two-tailed test at α = 0.10, and that
suggests a confidence level of 90%. But she decides since her hypothesis test has already failed to reach a conclusion she’d at least like to get a 95% CI.
TInterval: L3, 1, .95

results: (3.7948, 9.4052)
Conclusion: Francine is 95% confident that the average woman student at Wossama a U gains 3.8 to 9.4 pounds during her freshman year.
(Francine doesn’t write down d̅, s, and n because she’s already wri en them in the hypothesis test. She would write them down when she does
only a confidence interval.)
Common mistake: Don’t say the average weight is 3.8 to 9.4 pounds. You aren’t estimating the average first-year woman’s weight, but her weight gain.
Always re-read your conclusion after you write it, and ask yourself whether it seems reasonable in the context of the problem. That can save you
from mistakes like this.
11B2. Example 8: Coffee and Heart Rate
A few years back, a coffee company tried to market drinking coffee as a way to relax — and they weren’t talking about
Person Before After
decaf. Jon decided to test this. He randomly selected six adults. He recorded their heart rates, then recorded them again
half an hour after each person drank two cups of regular coffee. His data are shown at right. (Data come from Dabes and 1 78 83
Janik [1999, 264] [see “Sources Used” at end of book].)
2 64 66
The data are paired, because each person (experimental unit) gives you two numbers, Before and After; because each
3 70 77
After is associated with one specific Before; and because you can’t rearrange Before or After and still have the data make
sense. 4 71 74
Jon selected the 0.01 significance level. (He tests for difference even though he believes coffee increases heart rate, because 5 70 75
it could decrease it.)
6 68 71
Jon could equally well define d as Before−After or After−Before. At least, mathematically he could. But you’ll find it’s
easier to interpret results if you always define d as high minus low so that all or most of the d’s will be positive
numbers. (You can do this based on your common sense or by looking at the data.) Jon sees that the After numbers are generally larger than the
Before numbers, so he chooses d = After−Before.
(1) d = After−Before
H0: µd = 0, coffee makes no difference to heart rate
H1: µd ≠ 0, coffee makes a difference to heart rate
(2) α = 0.01
(RC) Jon has a random sample, but the sample size is <30. (The sample of six is obviously less than 10% of coffee drinkers.) He puts the Before
figures in L1, After in L2, and then L2−L1 (not L1−L2) in L3. The box-whisker plot of L3 finds no outliers. The normal probability plot shows
r=.9638, crit=.8893; r>crit.
(3/4) T-Test: 0, L3, 1, ≠µo

results: t=5.56, p = 0.0026 , d̅=4.2, s=1.8, n=6

(6) Drinking coffee does make a difference in heart rate half an hour later (p = 0.0026). In fact, coffee increases heart rate.
Or,
Drinking coffee does make a difference in heart rate half an hour later, at the 0.01 significance level. In fact, drinking coffee increases heart
rate.
As usual, when you do a two-tailed test and p < α, you can interpret it in a one-tailed manner. Jon defined d as After−Before, which is the amount of
increase in each subject’s heart rate. His sample mean d̅ was positive, so the average outcome in his sample was an increase. Because he proved that
the mean difference µd for all people is nonzero, the sign of his sample mean difference d̅ tells him the sign of the population mean difference µd.
Jon can’t say that the average increase for people in general is 4.2 beats per minute. That was the mean difference in his sample. If he wants to know
the mean difference for all people, he has to construct a confidence interval:
TInterval: L3, 1, .99
result: (1.146, 7.187)
Jon is 99% confident that the average increase in heart rate for all people, half an hour after drinking two cups of coffee, is 1.1 to 7.2 beats per minute.
Caution! The confidence interval expresses a difference, not an absolute number. You are estimating the amount of increase or decrease, not the heart
rate. A common mistake would be to say something about the heart rate being 1.1 to 7.2 bpm after coffee. Again, you’re not estimating the heart rate,
you’re estimating the change in heart rate.
With paired data, you tested the population mean difference µd between matched pairs. But suppose you don’t have matched pairs? With unpaired
data in independent samples, you test the difference between the means of two populations, µ1−µ2.
This is Case 4 in Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]. Key features:
You identify population 1 and population 2 at the start of your HT or CI.
The two samples must be independent, and you check requirements for each sample separately.
Sample sizes need not be equal, but they should not be very different.
You use 2-SampTTest for hypothesis test, 2-SampTInt for confidence interval. Always use Pooled:No with both.
Advice: Take your time when you look at data to decide whether you have paired or unpaired data. If your sample sizes are different, it’s a no-
brainer: the data are unpaired. But if the sample sizes are the same, think carefully about whether the data are paired or unpaired. Sometimes
students just seem to take a stab in the dark at whether data are paired or unpaired, but if you just stop and think about how the data were taken you
can make the right decision every time. Look back at Paired and Unpaired Data at the beginning of this chapter if you need a refresher on the
difference.
Example 9: A Tough Grader?
Prof. Sullivan’s students at Wossama a U felt that he was a tougher grader than the other speech professors. They decided to test this, at the 0.05
Eight of them each took a two-hour shift, assigned randomly at different times and days of the week, and distributed a questionnaire to each
student on the main quad. They felt this was a reasonable approximation to a random sample of current students. (They asked students not to take a
questionnaire if they had already submi ed one.) The questionnaire asked whether the student had taken speech in a previous semester, and if so
from which professor and what grade they received. They then divided the questionnaires into three piles, “no speech”, “Sullivan”, and “other prof”.
It would be possible to do an analysis with the categorical data of le er grades. But you should always use numerical data when you can,
because p-values are usually lower with numeric data than a ribute data, for a given sample size. The students counted an A as 4 points, A-minus as
3.7, and so on. Here is a summary of their findings:
Standard
Students of Mean Sample Size
Deviation
Sullivan 2.21 1.44 32
Other prof 2.68 1.13 54
Hypothesis Test for Difference of Means
In this test, you have unpaired numeric data in two samples. The requirements for each sample are the same as the test for the sample in a one-sample
t test:
Simple random samples (or equivalent, such as systematic).
Each sample is either n ≥ 30, or normally distributed with no outliers.
Each sample is no more than 10% of the population it was drawn from.
There’s an additional requirement for the two samples:
The two samples must be independent.
Here’s the hypothesis test, as performed by Prof. Sullivan’s students:
(1) pop. 1 = Sullivan students, pop. 2 = other speech profs’ students

H0: µ1 = µ2, no difference in average grades
H1: µ1 < µ2, Sullivan’s grades lower on average
(2) α = 0.05
(RC) Random sample (systematic).
Are samples less than 10% of their populations? 10×32 = 320, and 10×54 = 540. At a university there are almost certainly more speech
students per professor than that, especially considering multiple years.
Both sample sizes > 30.
Samples independent (no connection between Sullivan students and non-Sullivan students).
(3/4) 2-SampTTest: x̅1=2.21, s1=1.44, n1=32, x̅2=2.68, sx2=1.13, n2=54, µ1<µ2, Pooled:No
Results: t = −1.58, p = 0.0600, df=53.58
The test statistic is still Student’s t, but adapted for two samples. See the BTW note below for more about that and about the funny
number of degrees of freedom.

(6) At the 0.05 level of significance, they can’t determine whether Prof. Sullivan is a tougher grader than the other professors or not.
BTW: How does your calculator analyze a difference of independent means? If you remember what you learned about one-sample t tests [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c10_h test_root], all you have to do is extend it.
You’re working with a difference of sample means. The standard error of the mean for the first population is s1/√n1 and therefore the variance is s1²/n1,
and similarly for the second population. The variance of the sum or difference of independent variables is the sum of their variances, so
VAR(x̅1−x̅2) = s1²/n1 + s2²/n2. The standard deviation (the standard error of the difference of sample means) is the square root of the variance:
It turns out that the difference of sample means follows a t distribution — if you choose the right number of degrees of freedom (more on that later). The one-
sample test statistic was t = (x̅−µo) / (s/√n). The two-sample test statistic is analogous, with the differences substituted. The test statistic becomes
. In this course, you’ll just be testing whether one population mean is greater than, less than, or different from the other. In
other words, you’ll test against a hypothetical mean difference of 0. That simplifies t a bit: .
What about degrees of freedom? You might think df would be n1+n2−1, but it isn’t. The sampling distribution
approximately follows a t with df equal to the lower of n1−1 and n2−1. It’s only approximate because the population SD are
usually different. The exact degrees of freedom were computed by B. L. Welch (1938) [see “Sources Used” at end of book], and
the horrendous, ugly equation is shown at right. Fortunately, your TI-83/84 has the computation built in, and you don’t have
to worry about it.
What about pooling? Why do you always select Pooled:No on your TI-83/84? Well, if the two populations have the same SD (if they are homoscedastic)
you can treat them as a single population (pool the data sets) and use a higher number of degrees of freedom. That in turn means your p-value will be a bit
lower, so you’re a bit more likely to be able to reject H0. Sounds good, right? But there are problems:
You have to perform another test, called an F test, to determine whether σ1 = σ2.
If the F test gets a large p-value, that doesn’t prove homoscedasticity; it just fails to prove that the SD are different. So you can never be sure that it’s
okay to pool the data for the t test.
The F test requires the populations to be normal, not just approximately normal. This is usually difficult or impossible to prove.
Even if you do pool the two samples, there’s seldom enough difference in the p-value to make a difference in whether you reject H0 or not.
For these reasons and others, the issue of pooling is controversial. Some books don’t even mention it. It’s best just to use Pooled:No always.
Confidence Interval for Difference of Means
The requirements are exactly the same as the requirements for the hypothesis test. You compute a confidence interval on your TI-83/84 through
2-SampTInt.
Since they couldn’t prove that Prof. Sullivan was a tough grader, the students decided to compute a 90% confidence interval for the difference
between Prof. Sullivan’s average grades and the other speech profs’ average grades:
pop. 1 = Sullivan students; pop. 2 = other speech profs’ students

Requirements: already covered in hypothesis test.
2-SampTInt: x̅1=2.21, s1=1.44, n1=32, x̅2=2.68, sx2=1.13, n2=54, C-Level=.9, Pooled:No
Results: (−.9678, .02779)
Interpretation: The TI-83 gives you the bounds for the confidence interval about µ1−µ2. A negative number indicates µ1 smaller than µ2, and a
positive number indicates µ1 larger than µ2. Therefore:
We’re 90% confident that the average student in Prof. Sullivan’s classes receives somewhere between 0.97 of a le er grade lower than the average
student in other profs’ speech classes, and 0.03 of a le er grade higher.
Remark: The 90% confidence interval is almost all negative. This reflects the fact that the p-value in the one-tailed test for µ1 < µ2 was almost as low
as 0.05.
The students could have chosen any confidence level they wanted, just for showing an effect size. But for a confidence interval equivalent to their
one-tailed hypothesis test that used α = 0.05, the confidence level has to be 1−2×0.05 = 0.90 = 90%.
Why do you need a special two-sample t procedure? Can’t you just compute a confidence interval from each sample and then compare them? No,
because the standard errors are different. The two-sample standard error takes the sample SD and sample sizes into account. Here’s a simple
example, provided by Benjamin Kirk:
A farmer tests two diets for his pigs, randomly assigning 36 pigs to each sample. The Diet A group gained an average 55 lb with SD of 3 lb; that
gives a 95% confidence interval 54.0 to 56.0 lb. The Diet B group gained 53 lb on average, with SD of 4 lb; the CI is 51.6 to 54.4 lb. Those intervals
overlap slightly, which would not let you conclude that there’s any difference in the diets.
But the 2-SampTInt is 0.3 to 3.7 lb in favor of Diet A, which says there is a difference. The issue is that the B group had a lower sample mean, but
there was more variation within the group.
11C1. Example 10: Sorority Academics
The Alpha Alpha Alpha sorority chapter at Staples University (Yes, corporate sponsorship is ge ing ridiculous!) has a tradition of pu ing in extra
effort academically. They gave their incoming pledges the task of proving that Alpha Alpha Alpha had higher average GPA than other sororities, at
the 0.05 level of significance. The Alphas are a large sorority, with 119 members.
The pledges hacked the campus server and obtained GPAs of ten randomly selected Alphas and ten randomly selected members of other
sororities on campus. Do their ill-go en data prove their point?
Alphas: 2.31 3.36 2.77 2.93 2.27 2.35 3.13 2.20 3.20 2.45
Other sororities: 1.49 1.74 2.70 2.40 2.17 1.08 1.85 1.96 2.08 1.49
Since you have independent samples (unpaired data) from two different populations, this is Case 4, difference of population means, in Inferential
Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top].
Caution: You can’t treat these as paired data just because the sample sizes are equal; that’s a rookie mistake. When deciding between a paired
or an unpaired analysis, always ask yourself: “Is data point 1 from the first sample truly associated with data point 1 from the second sample?” In
this case, they’re not.
(1) pop. 1 = Alpha Alpha Alpha; pop. 2 = other sororities

H0: µ1 = µ2, No difference in average GPA
H1: µ1 > µ2, Average GPA of all Alphas is higher than other sororities
(2) α = 0.05
You check requirements against both samples independently. These samples are both smaller than 30, so you have to check normality and outliers on
both. Here are the normality checks:
The first picture doesn’t look much like a straight line, but r is greater than crit, so it’s close enough. (With small data sets like this one, fi ing the data
to the screen can make differences look larger than they really are.)
The calculator lets you “stack” two or three boxplots on one screen. Not only is this a bit of a labor saver, but it also gives
you a good sense of how different the samples are. To do this, select “Compare 2 smpl” on the first box-whisker screen.
You can guess what “Compare 3 smpl” does, but we don’t use it in this course.)
For these samples, the difference is dramatic. Every single Alpha’s GPA (in the sample) is above the third quartile in
the sample of other sororities, and the max of other sororities is just barely above the median Alpha.
With such a big difference, why do the pledges even need to do a hypothesis test? Because they know these are just samples. Maybe the Alphas
actually aren’t any be er academically, but these particular samples just happened to be far apart. The hypothesis test tells you whether the
difference you see is too big to be purely the result of random chance in sample selection.
(RC) Random samples, OK
10% of Alphas is 12, and the sample is smaller than that. We don’t know how many are in all the other sororities combined, but it must be
more than 10×10 = 100. OK
Normality check, sample 1: r(.9567) > crit(.9179), OK
Normality check, sample 2: r(.9946) > crit(.9179), OK
Box-whisker: no outliers in either sample, OK
(3/4) 2-SampTText L1, L2, 1, 1, >µ2, Pooled:No
outputs: t = 3.93, p-value = 0.0005 ,
x̅1 = 2.70, s1 = 0.43, n1 = 10
x̅2 = 1.90, s2 = 0.48, n2 = 10

(6) The average GPA in Alpha Alpha Alpha is higher than the average GPA of other sorority members (p = 0.0005).
[Or, at the 0.05 level of significance, the average GPA in Alpha Alpha Alpha is higher than the average GPA of other sorority members.)
Comment: You have to phrase your conclusion carefully. The pledges proved that the average GPA of Alphas is higher than the average GPA of all
other sorority members, not all other sororities. What’s the difference? Here’s a simple example. Suppose there are ten other sororities besides the
Alphas. The Omegas have an average GPA of 3.66, higher than the Alphas’ average. If the other nine each have an average GPA of 1.70, that could
easily produce exactly the sample that the pledges got.
The message here: Aggregating data can lose information. Sometimes that’s okay, but be wary when one population is being compared to an
aggregate of multiple other populations.
When you have two samples of binomial data, they represent two populations. Each population has some proportion of successes, p1 and p2
respectively. You don’t know those true proportions, and in fact you’re not concerned with them. Instead, you’re concerned with the difference
between the proportions, p1−p2. You can test whether there is a difference (hypothesis test), or you can estimate the size of the difference (confidence
interval).
This is Case 5 in Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]. Key features of Case 5, the
difference of proportions:
You identify population 1 and population 2 at the start of your HT or CI.
The two samples must be independent. (It’s possible to analyze paired binomial data, but we don’t do it in this course.) You check
requirements for each sample separately.
Sample sizes need not be equal, but they should not be very different.
Advice: take your time with two-sample binomial data. You have a lot of p’s and a lot of percentages floating around, and it’s easy to get mixed up if
you try to hurry.
Take extra care when writing conclusions. You’re making statements about the difference between the two proportions, not about the
individual proportions. And you’re making statements about the difference in proportions between the populations, not between the samples.
11D1. Example 11: Traffic Stops and Traffic Tickets
One of my students — call him Don — had several traffic tickets, and he knew one more would
Stopped by Traffic Cop
trigger a suspension. He felt that women stopped by a traffic cop were more likely than men to get off
with just a warning, and for his Field Project he set out to prove it, with α = 0.05. Ticket Just a
Total p̂
Don quickly realized that he should test whether men and women stopped by a cop are equally Issued Warning
likely to get a ticket, not just whether men are more likely. After all, he couldn’t rule out the
possibility that women are more likely to get a ticket if stopped. Men 86 11 97 89%
Women 55 15 70 79%
Don distributed a questionnaire to a systematic sample of TC3 students. (He assumed that any
gender-based difference in TC3 students would be representative of college students in general. That
seems reasonable.) He asked three questions:
1. Male or female?
2. Stopped by a traffic cop since your 18th birthday?
3. If yes, did you receive a ticket the last time you were stopped?
Don disregarded any questionnaires from students who had never been stopped as adults. He wasn’t interested in the likelihood of ge ing a ticket,
but in the likelihood of ge ing a ticket after being stopped by a cop. You could say that he was interested in the different proportions, for men and
women, of stops that lead to tickets.
Hypothesis Test for Difference of Proportions
This is just another variation on the good old Seven Steps of Hypothesis Tests [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#ht7_top]:
You identify population 1 and population 2 at the beginning of step 1.
As usual, your TI-83 will compute the p-value for you; the menu selection this time is a 2-PropZTest, so your test statistic is a z-score.
In step 6, if your p-value is < α, you conclude that the proportion of ___ in one population is greater than or less than the proportion of ___
in the other. If p-value is > α, you can’t tell whether the two populations have the same proportion of ____ or different proportions.
Here are the requirements for a Case 5 hypothesis test of a difference of proportions:
Each sample must be random, and ≤ 10% of population, so 10n1 ≤ N1 and 10n2 ≤ N2.
You need at least 10 successes and 10 failures in each sample.
Actually, that’s an approximation to the real requirement. We use it because it nearly always gives the same answer, and it’s easier to
test.
The real requirement is at least 10 successes and 10 failures EXPECTED in each sample. The expected numbers are what you would
see in your samples if H0 is true and there’s no difference between the two population proportions. In that case, the pooled proportion p̂,
which is the overall percentage of success in the combined samples, is an estimator of the true proportion in both populations.
That pooled proportion is . (2-PropZTest shows you p̂ on the output screen.) Using that pooled proportion p̂, the
expected successes and failures in sample 1 are n1p̂ and n1−n1p̂, and the expected successes and failures in sample 2 are n2p̂ and n2−n2p̂. All
four of these must be ≥ 10.
The Gardasil vaccine example, below, shows a situation where you have to use the blended proportion to test requirements.
Here is Don’s hypothesis test about the different proportions of men and women that receive tickets after being stopped in traffic.
(1) population 1 = college men stopped by traffic cops; population 2 = college women stopped by traffic cops
H0: p1 = p2, college men and women equally likely to get a ticket after being stopped
H1: p1 ≠ p2, college men and women not equally likely to get a ticket after being stopped
(2) α = 0.05
(RC) Samples 1 and 2 random? Yes, effectively (systematic).
10n1 = 10×97 = 970, and there have been more than 970 male students (at all colleges) stopped by traffic cops.
10n2 = 10×70 = 700, and there have been more than 700 female students (at all colleges) stopped by traffic cops.
Sample 1 has 86 successes and 97−86 = 11 failures; sample 2 has 55 successes and 70−55 = 15 failures.
(3/4) 2-PropZTest: 86, 97, 55, 70, p1≠p2
Results: z=1.77, p-value = 0.0760 , p̂1 =.89, p̂2=.79, p̂=.84
There’s a difference of 10 percentage points between the sample proportions, but with Don’s sample sizes that difference is not large enough
to be statistically significant. Even if there really is a difference in proportions for college men and women in general, random chance would
be enough to explain the difference Don sees in his samples.
(6) At the 0.05 level of significance, Don can’t tell whether men and women stopped by traffic cops are equally likely to get tickets, or not.
If this non-conclusion leaves you non-satisfied, you’re not alone. As usual, the confidence interval (next section) can provide some information.
BTW: Why does the “official” requirement use a pooled proportion p̂ instead of testing each sample? In fact, for a confidence interval you always test requirements for each
sample. But in a hypothesis test, your H0 is always “no difference in population proportions”, and a hypothesis test always starts by assuming H0 is true. If the null is true,
then there is no difference in the two populations, and you really just have one big sample of size n1+n2 and sample proportion p̂. So that’s what you test.
BTW: Why is this a z test? For the same reason that a one-proportion test is a z test: from the population proportion p you know the SD.
Of course the two-population case is a bit more complicated. You need the key fact that when you add or subtract independent random variables,
their variances add. If the two populations have the same proportion p, as H0 assumes, then the SD of the sampling distribution of the proportion for
population 1 is √p̂(1−p̂)/n1, and similarly for population 2, where p̂ is the pooled proportion mentioned in the requirements check, above. Square the SD to get
the variances, add them, and take the square rot to get the standard error: . And from this you have
the test statistic: .
Confidence Interval for Difference of Proportions
In a confidence interval for the difference of two proportions, some unknown proportion p1 of population 1 has some characteristic, and some
unknown proportion p2 of population 2 has that characteristic. You aren’t concerned with those proportions on their own, but you want to estimate
which population has the greater proportion, and by how much.
You identify population 1 and population 2 at the beginning of your analysis.
As usual, your TI-83 will compute the interval for you; the menu selection this time is a 2-PropZInt. The CI estimate is for p1−p2, the true
difference between the proportion of success in the two populations. A negative number in the confidence interval means the population 1
proportion is lower than the population 2 proportion, and a positive number means p1 is greater than p2.
Your conclusion states the size and direction of the difference between the two population proportions. It may take a couple of drafts before
you get this into understandable language.
The requirements for a CI are almost the same as a HT, but with one subtle difference:
Each sample must be random, and ≤ 10% of the population it was drawn from.
Each sample must have ≥ 10 successes and ≥ 10 failures.
Why is that last requirement different from the “official” requirement for the hypothesis test? With the HT, you assumed H0 was true and both
populations had the same proportion. That let you use a blended or pooled proportion from your combined samples. But with a CI, you don’t make
any such assumption. What would be the point of a confidence interval for the difference if you assume there is no difference?
But despite the difference in theory, as a practical ma er you can just test for ≥ 10 successes and ≥ 10 failures in each sample for both HT and CI.
Don has already checked requirements in the hypothesis test, so he moves right to a 2-PropZInt:
Don gets a result of −1.4% to +21.6%. How does he interpret that? Well, he can write it as
−1.4% ≤ p1−p2 ≤ 21.6% (95% conf.)
Adding p2 to all three “sides” gives
p2−1.4% ≤ p1 ≤ p2+21.6% (95% conf.)
With 95% confidence, p1 is somewhere between 1.4% below p2 and 21.6% above p2. You don’t know the numerical value of p1, but out of male
students who are stopped by a traffic cop, p1 is the proportion who get a ticket, and similarly for p2 and women. So Don can write his confidence
interval like this:
I’m 95% confident that, out of students stopped by traffic cops, the proportion of men who actually
get tickets is somewhere between 1.4 percentage points less than women, and 21.6 percentage points more
than women.
If you’re not feelin’ the love with the algebra approach, you can reason it out in words. The confidence
interval is the difference in proportions for men minus women. If that’s negative, the proportion for men is
less than the proportion for women; if the difference is positive, the proportion for men is greater than the
proportion for women.
Why do I say “percentage points” instead of just “percent” or “%”? Well, how do you describe the
difference between 1% and 2%? It’s a difference of one percentage point, but it’s a 100% increase, because
the second one is 200% of the first. When you subtract two percentages, the difference is a number of
percentage points. If you just say “percent”, that means you’re expressing the difference using one of the
percentages as a base, even if you don’t mean to.
Ge ing back to Don’s confidence interval, the −1.4% to +26.1% difference between men and women in
traffic tickets is a simple subtraction of men’s rate minus women’s rate, so it is percentage points, not
percent.
BTW: Where does the confidence interval come from? First you have to find the standard error. Yes, it’s different from the
standard error associated with the hypothesis test. Why? That standard error assumed H0 was true and used the pooled p̂.
You can’t do that in the confidence interval, because if H0 is true then the difference between the population proportions is
zero and you don’t have a confidence interval!
The standard deviation of the sampling distribution of the proportion for population 1 is √p̂1(1−p̂1)/n1,
and similarly for population 2. Square them, add, and take the square root to get the SD of the distribution of
used by permission; source: h p://xkcd.com/985/
differences in sample proportions, also known as the standard error of the difference of proportions:
(accessed 2014-10-03)
. The margin of error is zα/2 times that. The center of the
confidence interval is the point estimate, (p̂1−p̂2), so the bounds for the (1−α)% confidence interval are
(p̂1−p̂2)−E ≤ p1−p2 ≤ (p̂1−p̂2)+E where
Just like with numeric data, you have to use the two-sample procedure to compute a correct confidence interval. Here’s an example.
Two candidates are running for city council, so they each commission an exit poll on Election Day. Of 200 voters polled, 110 voted for Mr. X; 90
of a different 200 voted for Ms. Y. The 95% confidence intervals are 48.1% to 61.9% and 38.1% to 51.9%. The intervals overlap, so Ms. Y might still
hope for victory. But a 2-PropZInt tells a different story. The interval for the difference of proportions, X−Y, is 0.2% to 19.8%, so Mr. X is 95% confident of
winning, and the only question is whether it will be a squeaker or a landslide.
Necessary Sample Size for Confidence Interval
You have a confidence level and a desired margin of error in mind. How large must each sample be?
You may remember with the one-population binomial case, part of the calculation was your prior estimate, or if you had no prior estimate you
used 0.5. With two binomial populations, you need a prior estimate (or 0.5) for each one.
The easiest way to compute the necessary sample size is to use MATH200A Program part 5. If you don’t have the program and want to get it, see
Ge ing the Program [URL: h ps://BrownMath.com/ti83/math200a.htm#Download]. You can also calculate necessary sample size by using the
formula in the next paragraph, if you don’t have the program.
BTW: The formula for sample size is not too difficult. Start with the formula for margin of error. The desired confidence level determines critical z. But when you fill in your
desired margin of error E and your prior estimates p̂1 and p̂2, you still have two unknowns, n1 and n2. The simplest assumption is that you’ll make your two samples the same
size, so set n1 = n2 and solve:
For a detailed explanation, with worked examples, see How Big a Sample Do I Need? [URL: h ps://BrownMath.com/stat/sampsiz.htm].
Caution! When you’re planning to study the difference between two binomial populations, you have to use the two-population binomial
computation of sample size. If you compute one sample size for sample 1 and a separate sample size for sample 2, you’ll come out much too low.
Example 12: Let’s look back once more at Don and his traffic stops. His 95% confidence interval was −0.0141 to +0.21587. That’s a margin of error of
(.21587−(−.0141))/2 = 11½ percentage points. How large must his samples be if he wants a margin of error of no more than 5 percentage points but
he’s willing to be only 90% confident?
Solution: Don can use his sample proportions as prior estimates. Those were 86/97 ≈ 0.8866 for men and 55/70 ≈ 0.7857 for women.
With the MATH200A If you’re not using the program:

program (recommended):
Here’s the output screen The calculation is a li le easier if you break it into chunks. First compute p̂1(1−p̂1) + p̂2(1−p̂2). When you press [Enter],
from MATH200A the calculator displays that result.
Program part 5,
2-pop binomial: You want to multiply that by (zα/2/E)². Press the [×] key, and the calculator displays Ans*. Then press the opening
paren [(], enter the fraction, and square it.
What is zα/2? You did this in How Big a Sample for Binomial Data? [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c09_SS2] in Chapter 9. The confidence level is 1−α = 0.9, so α = 0.1, α/2 =
0.05, and zα/2 is invNorm(1−.05). The margin of error is 5% or .05 (not .5 !).
Caution: You don’t round the sample size. If you don’t get a whole number from the
calculation, always go up to the next whole number. A sample size of 291.0255149 or
greater gives a margin of error of .05 or less, at 90% confidence. The smallest whole number
that is 291.0255149 or greater is 292, not 291.
Answer: Don needs a sample of 292 men and 292 women if he wants 90% confidence in an estimate of the difference with margin of error no more
than 5%.
Rookie mistake: Don’t just say “292”. It’s 292 from each population.
Why do you need such large samples, even at a confidence level as low as 90%? Part of the answer is that binomial data do need large samples;
remember that a single sample of just over a thousand gives you a 3% margin of error at the 95% confidence level. And when you have two
populations, you are estimating the difference between two unknown parameter values, p1 and p2. If each of those was estimated within a 3% margin
of error, the margin of error for their difference would be 6%, so the samples have to be larger in the two-population binomial case.
Example 13: The Prime Minister knows that his program of tax cuts and reduced social services appeals more to Conservatives than to Labour, but he
wants to know how large the difference is. To estimate the difference with 95% confidence, with a margin of error of no more than 3%, how many
members of each party must he survey?
Solution: You’re given no estimate of support within either party, so use 0.5 for p̂1 and p̂2. E = 0.03 (not 0.3).
With the MATH200A If you’re not using the program:

program
(recommended):
MATH200a/sample First compute p̂1(1−p̂1) + p̂2(1−p̂2) = 0.5(1−0.5)+0.5(1−0.5). You have to multiply that by zα/2,
size/2-pop binomial: which you find like this: C-Level = 1−α = 0.95 ⇒ α = 1−0.95 = 0.05 ⇒ α/2 = 0.025 ⇒
zα/2 = invNorm(1−.025).
Answer: To gauge the difference within a 3% margin of error, at the 95% confidence level, the Prime Minister needs to poll 2135 Conservative Party
members and 2135 Labour Party members .
11D2. Example 14: Gardasil Vaccine
The Gardasil vaccine is marketed by Merck to prevent cervical cancer. What are the statistics behind it? How do women decide whether to get
vaccinated? Should the vaccine be mandatory?
A Cortland Standard story (21 Nov 2002) summarized an article from the New England Journal of Medicine as follows
A new vaccine can protect against Type 16 of the human papilloma virus, a sexually transmi ed virus that causes cervical cancer, a
new study shows. An estimated 5.5 million people become infected with a strain of HPV [not necessarily this strain] each year in the
United States.
Efficiency rate of vaccine and placebo

Placebo: Group size 765, infection 41
HPV-16 vaccine: Group size 768, infection 0
Note: The study included 1533 women with an average age of 20.
(Similar studies were done for the vaccine’s effectiveness against another strain, HPV-18. According to the front page of the Wall Street Journal on 16
Apr 2007, HPV-16 and -18 between them “are thought to cause 70% of cervical-cancer cases.” The vaccine, developed by Merck, is now marketed as
Gardasil.)
The samples certainly show an impressive difference, but is it statistically significant? Could the luck of random sampling be enough to account for
that difference in infection rates?
The claim is “the vaccine protects against HPV-16.” To translate this into the language of statistics, realize that there are two populations:
(1) women who don’t get the vaccine, and (2) women who do get the vaccine.
Notice that the populations are all women, past, present, and future who don’t or do get vaccinated. The 765 and 768 women are samples, not
populations. The populations are unvaccinated and vaccinated, not placebo and vaccine. Placebos are administered to members of a sample, but a
population doesn’t “get placeboed”.
The data type is a ribute (binomial) because the original question or measurement of each participant is the yes/no question: “Did this woman
contract the virus?” (“Success” is an HPV-16 infection, not a good thing.) Since you’re comparing two populations, this is Case 5, Difference of Two
Proportions.
Is the Vaccine Effective?
If the vaccine works, then you expect more women without the vaccine to contract the virus, so make them population 1. (That’s not necessary; it just
usually makes things a li le simpler to call population 1 the one with higher numbers expected.)
Although you hope that the vaccine population will have a lower infection rate, it’s not impossible that they could have a higher rate. Therefore
you do a two-tailed test (≠). If p < α, then it’s time to say whether the vaccine makes things be er or worse.
Let’s use α = 0.001. You’re talking about cancer in humans, after all. A Type I error would be saying that Gardasil makes a difference when
actually it doesn’t. You don’t want women to get vaccinated, and have a false sense of security, if the vaccine actually doesn’t work, so a Type I error
is pre y serious.
(1) population 1 = unvaccinated women; population 2 = vaccinated women

H0: p1 = p2, the vaccine makes no difference
H1: p1 ≠ p2, the vaccine does make a difference
(2) α = 0.001
(RC) Randomized design? We’re not told in so many words, but this is a high-profile medical study so you can be pre y confident it was done
right.
Samples less than 10% of population? Yes, since millions of women will get the vaccine (if it’s proved effective) and millions won’t.
At least 10 yes and 10 no in each sample? In the placebo group, there were 41 yes and 765−41 = 724 no. In the treatment group, there were
no successes at all.
Does that mean you can’t do the hypothesis test? Remember that “at least 10 yes and 10 no in each sample” is a shortcut for the real
requirement, which is “at least 10 yes and 10 no expected in each sample if the null hypothesis is true”. If H0 is true, then the pooled
proportion p̂ = 0.0267 is an estimator of the proportions in both populations.
What would you expect if H0 is true? In the placebo group of 765, you would expect n1p̂ = 765×.0267 ≈ 20 yes and n1−n1p̂ = 765−20 =
745 no. You’d expect about the same in the treatment group of 768, so the “at least 10” requirement is met.
(3/4) 2-PropZTest: 41, 765, 0, 768, p1≠p2
results: z=6.50, p-value = 7.9E-11 , p̂1=.0536, p̂2=0, p̂=.0267
Pause for a minute to make sure you can keep all those p’s straight. The first one, p = 7.9E-11, is the p-value, the chance of ge ing such
different sample results if the vaccine makes no difference. p̂1 and p̂2 are those sample results: 5.4% of unvaccinated women and 0% of
vaccinated women in the samples contracted HPV-16 infections. p̂ without subscript is the pooled proportion: 2.7% of all women in the
study contracted HPV-16.
(6) The Gardasil vaccine does make a difference to HPV-16 infection rates (p = 8×10-11). In fact, it lowers the chance of infection.
Or,
At the 0.001 level of significance, the Gardasil vaccine does make a difference to HPV-16 infection rates. In fact, it lowers the chance of
infection.
11
-11
It’s worth reviewing what this p-value of 8×10 means. If the vaccine actually made no difference, there are only 8 chances in a hundred billion of
ge ing the difference between samples that the researchers actually got, or a larger difference.
How do you get from “makes a difference” to “reduces infection rate”? Remember that when p < α in a two-tailed test, you can interpret the
result in a one-tailed manner [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c10_h ails_Interp]. If the vaccine makes things different, as appears
virtually certain, then it must either make them be er or make them worse. But in the sample groups, the vaccine group did be er than the placebo
group. Therefore the vaccine can’t make things worse, and it must make them be er.
How Effective Is the Vaccine?
Can you do a confidence interval to estimate how much Gardasil reduces a woman’s risk of HPV-16 infection? Unfortunately, you can’t, because the
requirements aren’t met: There were zero successes in the second sample. You can’t think like the hypothesis test and use the blended p̂ to meet
requirements. Why wouldn’t that make sense? In a confidence interval, you’re specifically trying to estimate the difference between p1 and p2
(likelihood of infection for unvaccinated and vaccinated women), so you can’t very well assume there is no difference.
In terms of what you’re required to know for the course, you can skip to the next section right now. But if you want to know more, keep reading.
One informal calculation finds a number needed to treat per person actually helped (Simon 2000c [see “Sources Used” at end of book]). The
difference in sample proportions is 5.4 percentage points, and 1/.054 ≈ 18.5 is called the number needed to treat. (You may recognize this as the
expected value of the geometric distribution [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c06_GeomParams] with p = 5.4%.) In the long run, for
every 18 or 19 women who are vaccinated, one HPV-16 infection is prevented.
Caution! 5.4 percentage points is a difference in sample proportions. You can say only that the difference in the population is somewhere in the
neighborhood of 5.4 percentage points, not that it is that. The number needed to treat is therefore not exactly 18.5, just somewhere in the neighborhood
of 18.5. Even so, this is valuable information for women and their doctors.
Another approach is the rule of three, explained in Confidence Interval with Zero Events (Simon 2010 [see “Sources Used” at end of book]). When there
are zero successes in n events, the 95% confidence interval is 0 to 3/n. Here 3/768 = 0.0039, about 0.4%. The 95% confidence interval for the
unvaccinated population is 3.8% to 7.0%. So a doctor can tell her patients that about 38 to 70 unvaccinated women in a thousand will be infected with
HPV-16, but only about four vaccinated women in a thousand.
BTW: Each of those is a 95% confidence interval, but the combination isn’t a 95% confidence interval! In the long run, if you do a bunch of 95% CIs, one in 20 of them won’t
capture the true population parameter. Here you’re doing two, so there’s only a 95%×95% = 90.3% chance that both of these actually capture the true population proportions.
11E. Confidence Interval and Hypothesis Test (Two Populations)
Summary: If you have a confidence interval for the difference of two population means or proportions, you can conclude whether the difference
is statistically significant or not, just like the result of a hypothesis test.
Example 15: You’re testing the new drug Effluvium to see whether it makes people drowsy. Your 95% confidence interval for the difference between
proportions of drowsiness in people who do and don’t take Effluvium is (0.017, 0.041). That means you’re 95% confident that Effluvium is more
likely, by 1.7 to 4.1 percentage points, to cause drowsiness.
There’s the key point. You’re 95% confident that it does increase the chance of drowsiness by something between those two figures. How likely is
it that Effluvium doesn’t affect the chance of drowsiness, then? Clearly it’s got to be less than 5%.
When both endpoints of your confidence interval are positive (or both are negative), so that the confidence interval doesn’t include 0, you have
a significant difference between the two populations.
Example 16: Now, suppose that confidence interval was (−0.013, 0.011). That means you’re 95% confident that Effluvium is somewhere between 1.3
percentage points less likely and 1.1 more likely to cause drowsiness. Can you now conclude that Effluvium affects the chance of drowsiness? No,
because 0 (“no difference”) is inside your confidence interval. Maybe Effluvium makes drowsiness less likely, maybe it has no effect, maybe it makes
drowsiness more likely; you can’t tell.
When one endpoint of your confidence interval is negative and one is positive, so that the confidence interval includes 0, you can’t tell
whether there’s a significant difference between the two populations or not.
When 0 is inside the 1−α CI (the two endpoints have different signs), the two-tailed p-value is > α. Your sample doesn’t show a
difference between the population means or proportions, and you fail to reject H0.
When 0 is outside the 1−α CI (the two endpoints have the same sign), the two-tailed p-value is < α. Your sample shows a
difference between the two population means or proportions, and you reject H0.
This is exact for numeric data but approximate for binomial data. Why? Because the HT and CI use the same standard error for the numeric data
cases, but slightly different standard errors for two-population binomial data. (The two calculations are in BTW paragraphs earlier in the chapter.)
11F. More Confidence Intervals for Two Populations
Summary: Confidence intervals for two populations are easy enough to calculate on your TI-83. But one or both endpoints can be negative, and
that means you have to write your interpretation carefully. Don’t just say “difference”; specify which population’s mean or
proportion is larger or smaller. You must also distinguish between mean difference (for paired data) and difference in means (for
unpaired data).
Study these examples of confidence intervals for two populations, and you’ll learn how to write your interpretations like a pro!
11F1. Example 17: Heights of Men and Women
Here’s an example adapted from Johnson and Kuby (2003, 425) [see “Sources Used” at end of book]. Men’s and women’s heights are ND. From this
random sample, estimate the difference in average height as a 95% CI.
Sample Mean, x̅ Standard Deviation, s Sample Size, n
Female, pop. 2 63.8" 2.18" 20
Male, pop. 1 69.8" 1.92" 30
Analysis
You have independent samples here: you get one number from each individual. The data type is numeric (height), so you have Case 4, difference of
independent means.
Requirements Check
With independent means, you check requirements for each sample separately.
You’re told that each sample was an SRS, so that’s no problem.
The samples are smaller than 10% of all men and 10% of all women.
The sample of 30 men is big enough that normality and outliers aren’t an issue.
But what about the sample of 20 women? You don’t have the original data, so you can’t check normality and outliers. Fortunately, you don’t
need to. Since women’s heights are normally distributed, the distribution of sample means will be normal regardless of sample size.
All requirements for Case 4 are met.
Calculation and Conclusion
The TI-83 or TI-84 computes µ1−µ2, so you need to decide which will be population 1 and which will be population 2. I like to avoid negative signs,
so unless there’s a good reason to do otherwise I take the sample with the larger mean as sample 1; in this case that’s the men.
Whichever way you decide, write it down: pop 1 = ________, pop 2 = ________.
On your calculator, press [STAT] [◄] and scroll up or down to find 0:2-SampTInt. Enter the sample statistics and use Pooled:No. Here are the input
and output screens :
Conclusion: With 95% confidence, the average man at that college is between 4.8″ and 7.2″ taller than the average woman, or µM−µF = 6.0″±1.2″. (You would
probably present one or the other of those forms, not both.)
(6.0 is the difference of sample means and is the center of the confidence interval: x̅1−x̅2 = 69.8−63.8 = 6.0.)
Remark: The difference from the case of dependent means is subtle but important. With dependent means (paired data), the CI is about the average
difference in measurements of a single randomly selected individual or matched pair. But with independent means (unpaired data), the CI is about
the difference between the averages for two different populations.
11F2. Example 18: Coffee and Heart Rate with Negatives
Now let’s make up new data for the coffee example. (The new d’s are still normally distributed with no outliers.) Again, you’re estimating the mean
difference in heart rate due to drinking coffee.
Person 1 2 3 4 5 6
Before 78 64 70 71 70 68
After 79 62 73 70 71 67
d = A−B 1 −2 3 −1 1 −1
Notice that some heart rates declined after the people drank coffee. Now when you compute a 95% CI you get the results shown at right.
How should you interpret a negative endpoint in the interval? Remember that you are computing a CI for the quantity After−Before. You could
follow the earlier pa ern and say “With 95% confidence, the mean increase in heart rate for all individuals after drinking coffee is between −1.8 and
+2.1 beats per minute,” but only a mathematician would love a statement that talks about an increase being negative. Instead, you draw a ention to
the fact that the change might be a decrease or an increase, as follows.
Conclusion: With 95% confidence, the mean change in heart rate for all individuals after drinking coffee is between a decrease of 1.8 and an increase of 2.1 beats
per minute. Since it’s obviously very important to get the direction right, be sure to check your conclusion against your H1 (if any) and your original
definition of d.
Remark 1: Though it’s correct to present the CI as a point estimate and margin of error, it’s probably not a good idea because that form is so easy to
misinterpret. If you say “With 95% confidence, the mean increase in heart rate for all individuals is 0.2±1.9 beats per minute,” many people won’t
notice that the margin of error is bigger than the point estimate, and they’ll come to the false conclusion that you have established an increase in heart
rate after drinking coffee. As statistics mavens, we have a responsibility to present our results clearly, so that people draw the right conclusions and
not the wrong ones.
Remark 2: Remember that the CI occupies the middle of the distribution while the HT looks at the tails. If 0 is inside the CI, it can’t be in either tail.
Therefore, from this confidence interval you know that testing the null hypothesis µd = 0 at the 0.05 level (0.05 = 1−95%) would fail to reject H0: this
experiment failed to find a significant difference in heart rate after drinking coffee. (See Confidence Interval and Hypothesis Test (Two Populations).)
Remember the difference between “no significant difference found” and “no difference exists”. Since 0 is in the CI, you can’t say whether there is
a difference. The correct statement, “I don’t know whether there is a difference,” is different from the incorrect “There is no difference.”
11F3. Example 19: Opinion Poll
The following data are from Dabes and Janik (1999, 269) [see “Sources Used” at end of book]. Men and women were polled in a systematic sample on
whether they favored legalized abortion, and the results were as follows:
Sample Number in Favor, x Sample Size, n
Females, pop. 1 60 100
Males, pop. 2 40 80
Find a 98% confidence interval for the difference in level of support between women and men.
Analysis
You have binomial data: each person either supports legalized abortion or not. (Obviously this example is oversimplified.) Binomial data with two
populations is Case 5, difference of proportions.
Requirements Check
Support among the sample of women is 60/100 = 60%, and among the sample of men is 40/80 = 50%, so let’s define population 1 = women,
population 2 = men.
Each sample is a systematic sample, as good as an SRS.
10n1 = 10×100 = 1000; 10n2 = 10×80 = 800. There are more than 1000 women and 800 men in the world.
In the sample of women there are 60 successes and 100−60 = 40 failures; in the sample of men there are 40 successes and 80−40 = 40 failures.
All four numbers are well above 10.
All requirements for a Case 5 CI are met.
On the TI-83 or TI-84, press [STAT] [◄] and scroll up to find B:2-PropZInt. The input and output screens look like this:
Two-population confidence intervals can be tricky to interpret, particularly when the two endpoints have different signs and particularly for Case 5,
two population proportions. You can reason it out in words, or use algebra.
In words, remember that the confidence interval is the estimated difference p1−p2, which is the estimated amount by which the proportion in the first
population exceeds the proportion in the second population. So a negative endpoint for your CI means that the first proportion is lower than the
second, and a positive endpoint means that the first proportion is larger.
Using algebra, begin with the calculator’s estimate of p1−p2:

−0.0729 ≤ p1−p2 ≤ +0.27292 (98% conf.)
Add p2 to all three parts of the inequality, and you have
p2−0.0729 ≤ p1 ≤ p2+0.27292 (98% conf.)
That’s a li le easier to work with. The 98% confidence bounds on p1 (level of women’s support) are p2−0.0729 (7.3% below men’s support) and
p2+0.27292 (27.3% above men’s support).
Conclusion: You are 98% confident that support for legalized abortion is somewhere between 7.3 percentage points lower and 27.3 points higher among
females than males.
Remark: It would be equally valid to turn that around and say you’re 98% confident that support is between 27.3 percentage points lower and 7.3
points higher among males than females.
11F4. Example 20: GPA of Fraternity Members and Nonmembers
Johnson and Kuby (2003, 427) [see “Sources Used” at end of book] present another example. What is the difference (if any) in academic performance
between fraternity members and nonmembers? Forty members of each population were randomly selected, and their cumulative GPA recorded as an
indication of performance. The results were as follows:
Sample x̅ s n
Fraternity members, pop. 1 2.03 0.68 40
Independents, pop. 2 2.21 0.59 40
Analysis
Here you have numeric data, two independent samples. (You know it’s independent samples, unpaired data, because each member of the sample
gives you just one number.) This is Case 4, difference of independent means.
Requirements Check
Each sample was random, and each sample size is >30. We can assume that there are more than 10×40 = 400 fraternity members and 400 independents
on campus. All requirements for Case 4 are met.
The CI is −0.46 to +0.10, with 95% confidence. To interpret this, remember that the TI-83 computes a CI for µ1−µ2, and we
defined population 1 as fraternity and population 2 as independent. The calculator is telling you that
−0.46 ≤ µ1−µ2 ≤ +0.10 (95% conf.)
or, adding µ2 to all three parts,
µ2−0.46 ≤ µ1 ≤ µ2+0.10 (95% conf.)
Conclusion: The true difference in academic performance, as measured by average GPA, is somewhere between 0.46 worse and 0.10
be er for fraternity members relative to nonmembers, with 95% confidence.
You could also write a somewhat longer form: with 95% confidence, the average fraternity member’s academic performance, as measured by
GPA, is somewhere between 0.46 worse and 0.10 be er than the average independent’s performance.
Remark: Don’t be fooled by the fact that the CI is mostly below zero. You really cannot conclude that fraternity members probably have lower
academic performance. Remember that the 95% CI is the result of a process that captures the true population mean (or difference, in this case) 95
times out of 100. But you can’t know where in that interval the true mean (or difference) lies. If you could, there would be no point to having a CI!
Remark 2: Even though zero is within the CI, you must not say that there is no difference in performance between members and nonmembers. The
difference might indeed be zero, but it might also be anywhere between 0.46 in one direction and 0.10 in the other. There’s even a 5% chance that the
true difference lies outside those limits. Always bear in mind the difference between insufficient evidence for and evidence against. (You may hear
that said as “lack of evidence for is not evidence against.”)
This chapter covered confidence intervals and hypothesis tests for two samples, both binomial and numeric data. Instead of testing a population’s µ
or p against some baseline number, you test the µ or p of two populations against each other.
Key ideas: Numeric data can be paired or unpaired, a/k/a dependent or independent samples. Paired data arise when one experimental unit
generates two numbers. With unpaired data, there’s no specific association between a data point in one sample and any specific
data point in the other sample.
If there’s an effect to be found, you’re more likely to find it with a paired-data design than with unpaired data.
Caution! Just having equal sample sizes is not enough for paired data; there has to be an association between each member of
one sample and a specific member of the other sample. They can be two tasks performed by the same individual, husband-wife
studies, identical-twin studies, etc.
With paired numeric data, you have Case 3, mean difference.
In step 1 of your HT or at the beginning of your CI, write the definition of d, showing which direction you will subtract. Your
HT is about µd.
Do your requirements check on the d’s — you don’t care whether the original numbers pass requirements.
Use plain T-Test or TInterval on the differences.
With unpaired numeric data, you have Case 4, difference of means.
In HT step 1 or at start of CI, identify population 1 and population 2 (not sample 1 and sample 2).
Check requirements on each sample separately.
Use 2-SampTTest or 2-SampTInt.
Binomial data are never paired in this course; you have Case 5, difference of proportions.
In HT step 1 or at start of CI, identify population 1 and population 2 (not sample 1 and sample 2).
Requirements are slightly different between CI and HT, and in fact with HT it’s easier to check requirements after the
computations.
Use 2-PropZTest or 2-PropZInt.
Be able to calculate necessary sample size to keep margin of error below a desired value for a desired confidence level.
Spend time to interpret confidence intervals correctly. You must identify one population’s mean or proportion as larger or smaller
than the other (not just different), and by how much; in other words, you give the direction and size of the effect. This may take
more than one draft, especially when the ends of the CI have opposite signs.
With binomial data, the difference is a ma er of percentage points, not percent.
Your interpretation should clearly be about the populations, not the samples.
CI and HT are intimately associated. Zero outside the CI at a C-level of 1−α rules out “no difference”, so it matches up with a two-
tailed HT result of “reject H0” at the α level. Zero inside the interval admits “no difference” as a possibility, so it matches up with a
“fail to reject H0”.

do it after all.
You want to determine whether sports fans would pay 20% extra for reserved bleacher seats. You suspect that the answer may be different
1 between people aged under 30 and people 30 years old or older.
(a) How big must your samples be to let you construct a 95% confidence interval for the difference, with a margin of error of 3 percentage points?
(b) You discover a poll done last year, in which 30% of young people and 45% of older people said they would pay extra. Now how large must your
samples be?
A researcher wanted to see whether the English like soccer more than the Scots. She randomly selected eight English and eight Scots and asked
2 them to rate their liking for soccer on a numeric scale of 1 (hate) to 10 (love), and she recorded these responses:
English 6.4 5.9 2.9 8.2 7.0 7.1 5.5 9.3
Scots 5.1 4.0 7.2 6.9 4.4 1.3 2.2 7.7
(a) From the above data, can the researcher prove that the English have a stronger liking for soccer than the Scots? Use α = 0.05.
(b) Construct a 90% confidence interval for the different average levels of English and Sco ish enthusiasm for soccer.
Another researcher took a different approach. She polled random samples with the question “Do
3 you watch football at least once a week?” (In the UK they call soccer “football”). She got the
Sample Size Number of “Yes”
results shown at right. English 150 105
(a) At the 0.05 level of significance, are the English and the Scots equally fans of soccer?
(b) Construct a 95% confidence interval for the difference. Scots 200 160
(c) Find the margin of error in that interval.
(d) If the researcher repeats her survey, what sample size would she need to reduce the margin of error to 4 percentage points at the same confidence
level?
To see if running raises the HDL (“good”) cholesterol level, five female volunteers (randomly selected) had
4 their HDL level measured before they started running and again after each had run regularly for six months, Person
Before
Running
After
Running
an average of four miles daily.
(a) See if you can prove that the average person’s HDL cholesterol level would be raised after all that running. Use 1 30 35
α=0.05.
(b) Compute and interpret a 90% confidence interval for the change in HDL from running four miles daily for six 2 34 39
months. 3 36 42
4 34 33
5 40 48
The Physicians’ Health Study tested the effects of 325 mg aspirin every other day in preventing heart a ack in people with no personal history of
5 heart a ack. 22,071 doctors were randomized into an aspirin group and a placebo group. Of the 11,037 doctors who received aspirin, 10 had fatal
heart a acks and 129 had non-fatal heart a acks; total 139. For the 11,034 doctors in the placebo group, the figures were 26 and 213; total 239.
(a) At the 0.001 level of significance, does this aspirin regimen make a difference to the likelihood of a heart a ack?
(b) Find a 95% confidence interval for the reduced risk.
June was planning to relocate to Central New York, considering Binghamton and Cortland. She
6 found an online survey of prices of recently completed house sales as shown at right. The survey was
Mean S.D. n
two random samples taken about a month before she looked at the Web site. Cortland Co. $134,296 $44,800 30
(a) Construct a 95% confidence interval for the difference in mean house price in the two counties.
(b) Use that answer to determine which county has a lower average price of houses, at the 0.05 Broome Co. $127,139 $61,200 32
The Canter Polling Service conducted two national polls in the same week, one for the Red Party candidate and one for the Blue Party candidate.
7 Each one was a random sample of 1000 likely voters — not the same 1000, of course. (Most national polls have sample size 1000.)
In the first sample, 520 (52%) said they would vote for Red. In the second sample, 480 (48%) said they would vote for Blue. The newspaper
reported that Red was leading by 4%. What’s wrong with that? Write a correct statement, at the 95% level of confidence.
You have two independent random samples of yes/no data. Sample 1 has 7 yes out of 28, and sample 2 has 18 yes out of 32. Each sample is
8 smaller than 10% of the population.
(a) Is it valid to use 2-PropZInt to compute a confidence interval? Why or why not?
(b) Is it valid to use 2-PropZTest to compute a p-value? Why or why not?
12. Tests on Counted Data

Updated 22 Jan 2015
Intro: In Chapter 10 you learned about hypothesis tests, using one sample of numeric or binomial data to test a hypothesis about a
population mean or proportion. In Chapter 11, you extended that to inferences about the difference between two numeric or binomial
populations. With binomial data, you have counts of success and failure: two categories for one or two populations.
But what if you have more categories or populations? That’s when you use the tests in this chapter. The hypothesis tests will use
the same seven steps that you already know and love, but with a new test statistic called chi-squared. Your data will be counts of
members of the sample that fall into particular categories of one or two variables.
Contents: 12A. Testing Goodness of Fit to a Model

First Problem: M&M Colors
· The Theory
· Hypothesis Test in Practice
· Optional: Residuals
· Confidence Intervals?
Second Problem: Fruit Flies
· Solution
· Scientific Method and Your Conclusions
Third Problem: Equal Preferences
· Solution
First Problem: Office Equipment
· The Theory
· Hypothesis Test in Practice
· Optional: Residuals and More
Second Problem: The “Monday Effect”
· Solution
Third Problem: Tobacco Smoke and Tumors
· Solution
12A. Testing Goodness of Fit to a Model
Suppose you have one population divided into three or more categories — there are ≥ 3 possible non-numeric responses from each subject. For
example, instead of monitoring whether each patient had a heart a ack or not (two possibilities), you might monitor whether each person had a fatal
heart a ack, a non-fatal heart a ack, or no heart a ack (three possibilities).
When there were only two possibilities, you could talk about the proportion of successes in the population, because failure was the only other
possibility. If you knew about successes, you knew about failures. The population proportion of successes, p, was the population parameter.
But when you have three or more possibilities, that goes out the window. Knowing the proportion of one category in the population doesn’t tell
you the proportions of the others. So instead of testing against a particular proportion, you test against all the proportions at once. You have a
probability model [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c05_BasicsModel] in mind, and you perform a goodness-of-fit (GoF) test to
compare the data and the model. If the data are too far away from the model, you reject the model. This is a standard hypothesis test [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#ht7_top], but you’ll learn that you compute the p-value from a new distribution called χ².
As usual, I’ll show you the theory first, and then you’ll do calculations the easy way.
12A1. First Problem: M&M Colors
The M&M Mars Web site used to give the color distribution of plain M&Ms as 24% blue, 13% brown, 16% Plain M&Ms
green, 20% orange, 13% red, and 14% yellow. My Spring 2011 class counted the colors of 628 plain M&Ms
Color Model Observed p̂
and computed the sample proportions, as shown at right. Obviously their percentages differ from the
company model. But are they different enough to let the class reject the company’s model? Blue 24% 127 20.2%
Brown 13% 63 10.0%
The company’s model is your null hypothesis H0. The alternative hypothesis H1 is that the company model Green 16% 122 19.4%
is wrong. Let’s use a significance level of 0.05. Orange 20% 147 23.4%
Red 13% 93 14.8%
Caution: As always, to apply the analysis techniques you need a simple random sample, and the class didn’t
have that. The Fun Size M&Ms packs that they analyzed were bought from the same store on the same day Yellow 14% 76 12.1%
and almost certainly came from one small part of one production run. Although this wasn’t a true random Totals 100% 628 99.9%
sample, I’m going to proceed as though it was, to show you the method.
The Theory
By now you know the drill. Samples vary, so just about any real-life sample will be different from the theoretical expectation in H0. The question is
always the same: Is pure sample variability enough to account for the difference between H0 and this sample, or is there some real effect here beyond
that?
For each data type, you have a method to figure a test statistic and a p-value [URL: h ps://BrownMath.com/stat/castriag.htm]. The test statistic is a
standardized measure of the discrepancy between H0 and the sample, taking sample size into account; the p-value is the probability of ge ing that
sample, or one even further away from H0, if random chance is the only thing going on. (Most software and statistical calculators compute the test
statistic and p-value for you at the same time.)
So far you know two test statistics, z and t. Can you use one of them on this problem? There’s the obvious choice of performing six z tests of
proportion on the six colors. But in the immortal words of Richard Nixon of Watergate fame, “We could do that, but it would be wrong.”
Why would it be wrong? Well, you’re doing a hypothesis test at 0.05 significance, right? That means that you can live with a one in twenty
chance of a Type I error, calling the model bad when it’s actually good. But if you do a 0.05 significance test of each color, then you have a 0.05 chance
of a Type I error on blue, a 0.05 chance of a Type I error on brown, and so forth. Suddenly your real significance level is almost 0.30, which is
ridiculously high. (It’s not quite equal to 6×0.05 because you might get Type I errors on more than one color, and also because the colors aren’t
independent.)
Never do multiple tests on the same data, because that makes Type I errors way more likely than you can live with. You must do a single overall
test of the model as a whole, and that means a new test statistic. It’s called χ² or chi-squared.
The χ² computation will look a li le weird: you have to deal with each category because there are no summary statistics like x̅ and s to help you
along. But I’ll walk you through it and you’ll see that it’s not too bad, really.
How to pronounce χ² or chi-squared: The first consonant is roughly a k sound, so you can pronounce χ or chi as Kyle without the l sound. χ² rhymes
with “high-chaired”. If you want to get technical — and you know I do — the Greek le er sounds similar to Yiddish ch in l’chaim or Sco ish ch in loch.
It’s definitely not English ch as in church.
χ is not an X, by the way, even though it looks like one. Greek words beginning with χ are wri en with ch in English — words like chiropractor
and chronology. The Greek le er with the x sound is ξ, spelled xi and pronounced ksee, but it doesn’t figure in this course.
Okay, that’s enough Greek class. We now return you to statistics.
Computing Expected Counts (E’s)
The key concept in testing a probability model against data is expected count. Samples never Plain M&Ms
actually match a model, but what would a sample with this same size look like if it did? Well, if
Color Model Observed Expected
colors are supposed to be distributed in 24% blue, 13% brown, and so on, then a perfect match
within 628 M&Ms would be distributed in 24% blue, 13% brown, and so on. The expected counts Plain M&Ms
are computed in the table at the right. Color Model Observed Expected
The observed column is counts, which means whole numbers. But E’s are averages in a
Blue 24% 127 24% × 628 = 150.7
sense — what’s the average number of blues you’d expect if you took many, many samples of
Brown 13% 63 13% × 628 = 81.6
size 628 and the company’s 24% is correct? 150.7 — so they don’t need to be whole numbers and
typically are not. As you can see, even carrying E’s to one decimal place there’s a slight rounding Green 16% 122 16% × 628 = 100.5
error, 627.9 versus 628; rounding to whole numbers would give a bigger rounding error. Orange 20% 147 20% × 628 = 125.6
Software and calculators avoid this issue, by carrying more precision internally than they Red 13% 93 13% × 628 = 81.6
display. Yellow 14% 76 14% × 628 = 87.9
Totals 100% 628 627.9
In a goodness-of-fit test, your data are counts, just as they were in tests on binomial data. So it’s no surprise that the requirements for GoF are similar
to the requirements for binomial data [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c08_ReqCondProp].
You need a random sample (or equivalent) that is less than 10% of the population. But there are more than two categories in the model, so
instead of a success/failure condition you have a condition on the expected counts (E): The expected count in every category must be ≥ 5. (Some
authors use a looser requirement, that none of the E’s can be below 1, and no more than 20% of them can be below 5.)
By the way, make sure you actually have counted data. (Dave Bock [see “Sources Used” at end of book] calls this the Counted Data Condition.)
Sometimes students try to do a chi-squared test on sample means, but the chi-squared distribution is just for counts of categorical data.
What do you do if your E’s are too small? You can combine smaller categories, if the combination seems reasonable. For example, suppose you’re
studying some characteristic of people based on their home state. You could combine adjacent small states like Connecticut–Rhode Island and
Delaware–Maryland. But it’s best to plan ahead and not get into this position. Your smallest E will come from your sample size times your smallest
model category. Just plan for a large enough sample size to make that product ≥5.
Computing χ² Contributions and Total
Eyeballing the observed and expected numbers doesn’t really tell you much. What you need is a single number that shows the overall badness of fit
and can be related to a standardized distribution.
Plain M&Ms
To find this, you take the difference between observed and expected, square
it so that it’s always positive, and then divide by expected to scale the effect Color Model Observed Expected χ² contribution
size by the sample size. Do this for each row, and the result is called the “χ² Blue 24% 127 150.7 (127−150.7)²/150.7 = 3.73
contribution” for that row. Add up the rows and you have your χ² test Brown 13% 63 81.6 (63−81.6)²/81.6 = 4.24
statistic. The computations are shown at right, and for this sample and this Green 16% 122 100.5 (122−100.5)²/100.5 = 4.60
model you have χ² = 19.42. This is a standardized measure of how far the Orange 20% 147 125.6 (147−125.6)²/125.6 = 3.65
model and the data disagree. Red 13% 93 81.6 (93−81.6)²/81.6 = 1.59
BTW: All these computations are summarized in the formula χ² = ∑(O−E)²/E, where Yellow 14% 76 87.9 (76−87.9)²/87.9 = 1.61
the summation [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c01_BigSigma] is Totals 100% 628 627.9 19.42
over categories, not individual data points.
Properties of the χ² Distribution
Let’s pause to talk a li le about the χ² distribution.
χ² is actually a family of distributions distinguished by degrees of freedom (Greek ν “nu” or df). df = number of categories minus 1, so for
the M&Ms example df = 6−1 = 5.
BTW: The chi-squared distribution was developed independently by Ernst Carl Abbe (1840–1895, German) in 1863, by Friedrich Robert Helmert (1843–1917,
German) in 1875, and by Karl Pearson [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#bign_PearsonK] (1857–1936, English) in 1900. The name “chi-squared”
is due to Pearson, who also invented the goodness-of-fit test.
What does the distribution look like? Some good pictures on the Web show χ² curves for different df overlaid on one graph, but like this one
at Wikipedia [URL h p://en.wikipedia.org/wiki/File:Chi-square_pdf.svg accessed 2014-09-29] they really need to be seen on a color screen.
So here are my own side-by-side shots.
df=2 df=3 df=4 df=5 df=8 df=10

χ² distributions, all on the same scale of χ² = 0 to 16
Obviously χ² is skewed right, though less so for higher df. That makes sense, if you think about it. You compute χ² by adding positive
numbers, so obviously it can’t have a left tail that goes below 0, as z and t do.
And it also makes sense in terms of what you’re testing. Higher χ² represent poorer matches between model and data. χ² = 0 would
mean that the data match the model exactly, which is extremely rare. Negative χ² would mean that the data and model are be er than a
perfect match, which obviously can’t happen.
BTW: You might be interested to know that the mean of the distribution equals df, the mode is at df−2, and the median is about df−2/3. And then again, you might
not.
p-values must be looked up in a table, or more likely with software or a calculator. When the χ² value is a lot bigger than the degrees of
freedom, the data and the model are very different and the p-value is small. In the M&Ms example, df = 5 and χ² = 19.42, so the p-value is
small. (In fact, it’s 0.0016.)
BTW: Your TI-83 has the χ² distribution in the [2nd VARs makes DIST] menu, so if you needed to you could compute the p-value as χ²cdf(19.42,10^99,5). But in
practice your calculator will give you the p-value automatically, the same way it does in z and t tests.
Hypothesis Test in Practice
So much for the theory. But how will you test goodness of fit in practice? This section runs through the complete hypothesis test. There’s still some
commentary, but the stuff in boxes is what you’d actually write for a quiz or homework.
1. Hypotheses
With goodness of fit, there’s no single population parameter to test for. (If you want to get technical, the population parameter is a probability
distribution.) So you state the hypotheses in words, but usually including the model:
(1) H0: The 24:13:16:20:13:14 color distribution is correct.

H1: The color distribution on the Web site is incorrect.
2. Significance Level
Nothing new here:
(2) α = 0.05
3–4. Test Statistic and p-Value
Here you have a choice. The MATH200A Program [URL: h ps://BrownMath.com/ti83/math200a.htm#Download] is easiest to use, and also saves you
work with several other statistics procedures. If you don’t have the program, follow the procedure in How to Test Goodness of Fit on TI-83/84 [URL:
h ps://BrownMath.com/ti83/gof83.htm].
If you have a calculator in the TI-89 family, please see How to Test Goodness of Fit on TI-89 [URL: h ps://BrownMath.com/ti83/gof89.htm].
Put the model numbers in L1 — not the total. The model can be percentages or ratios. For example, with the M&Ms you
can enter 24, 13, 16, and so on, or .24, .13, .16, and so on; it doesn’t ma er as long as you’re consistent. Similarly, if you have
a 9:3:3:1 model you can enter 9, 3, 3, 1 or 9/16, 3/16. 3/16, 1/16.
Put the observed counts in L2 — counts, not ratios or percentages. Never enter the total.
Press [PRGM], then the number you see for MATH200A., then [ENTER]. Caution! Don’t press [►] or [◄].
Dismiss the splash screen and press [6] to select the GoF test. The confirmation screen asks you if you’ve entered the two necessary lists. If you
have, press [9] [ENTER].
The program performs the computations and graphs the χ² curve, also showing the p-value, test statistic, and degrees of
freedom. (In this case the graph looks blank because the p-value is so small.)
You might notice that the test statistic is 19.44, not 19.42 as computed earlier. That’s because the calculator keeps many
digits of precision, avoiding problems with rounding.
The program tells you how many of the categories have expected counts below 5. (If any are below 5, it also tells you how
many are below 1, but we don’t use that information in this book.) See Requirements Check below.
(3–4) MATH200A/GoF Test

df=5, χ²=19.44, p=0.0016
RC. Requirements Check
The program computes the expected counts, and places them in L3. As discussed above, you must not have any E’s below
about 5 to be sure that the test procedure is valid.
I said “about 5”. If the results screen shows one or more E’s below 5, look at L3 to see how far below 5. One expected
count just a li le below 5 is not necessarily a fatal flaw in the test.
(RC) Random sample? Not really, but we’re pretending.

Sample size under 10% of population? Yes, 10×628 = 6280 is far less than the total number of M&Ms.
All E’s ≥ 5? Yes, the smallest E in L3 is 81.64.
5. Decision Rule
This is the same for every type of hypothesis test.
6. Conclusion
(6) At the 0.05 level of significance, the color distribution on the Web site is incorrect. [Or, “... the color distribution on the Web site
is inconsistent with the data.”]
Or,
The color distribution on the Web site is inconsistent with the data (p = 0.0016).
Optional: Residuals
If you reject H0, can you say anything about which categories are most “responsible” for the overall deviation from the
model? Yes. DeVeaux, Velleman, Bock (2009, 699–700) [see “Sources Used” at end of book] suggest that you can look at the
standardized residuals (observed−expected)/√expected. These are essentially z-scores, and you recall that z has only a 5%
chance of being outside ±2 if the null hypothesis is true.
MATH200A part 6 already computes the squares of the residuals for you in list L4. The square of ±2 is 4, so when you
look at list L4 after running the program, you can be pre y sure that any row with a value above 4 indicates a category that
doesn’t match the model. (It’s more complicated, but that’s a decent rule of thumb.)
In this example, brown and green (rows 2 and 3) have squared residuals above 4. Therefore, for those colors, the differences between this sample
and the model are probably significant. Remember that L2 is the observed counts in the sample, and L3 is the expected counts from the model for this
sample size. You can see that there were significantly fewer browns than expected, and significantly more greens than expected. You might be a li le
suspicious of blue (row 1) and orange (row 4), but this sample’s differences from the model are probably not significant.
Even so, you can’t simply do 1-PropZTest on each category after rejecting H0 on your GoF test, because that would greatly increase your chance
of a Type I error above your stated α. More advanced textbooks will suggest alternatives, such as adjusting the significance level or taking a new
sample.
Confidence Intervals?
You may be wondering about computing a confidence interval. You can’t just do 1-PropZInt confidence intervals on the category proportions. A
confidence interval is the complement of a hypothesis test, so multiple confidence intervals on the same data have the same problem as multiple
hypothesis tests.
Confidence intervals can be computed for individual categories or the overall model, but the techniques are beyond the scope of this course. If
you’re interested, please look at Confidence Intervals for Goodness of Fit [URL: h ps://BrownMath.com/stat/gof_ci.htm]. It shows how to make the
calculations and includes an Excel workbook with instructions.
12A2. Second Problem: Fruit Flies
“A problem which frequently arises is that of testing the agreement between observation and hypothesis.” — Bulmer (1979, 154) [see “Sources Used”
at end of book]
The 9:3:3:1 ratio for crosses is pre y basic in genetics, when two independent traits are involved. Here the traits are green or red eyes and having
wings or not.
Dabes and Janik (1999, 273) [see “Sources Used” at end of book] give some data for the hybrid
offspring of fruit flies; see figures at right. The flies were randomly selected. Your task is to determine Model
Observed
whether this cross follows the 9:3:3:1 model or not. Use α=0.05. ratio
Suggestion: Stop reading at this point, and try to write out all the steps on your own, using the
preceding example as a model if you need to. Then compare your work to what follows. Green-eyed winged 9 120
BTW: You can read about the 9:3:3:1 ratio in many places such as Wikipedia’s Mendelian Inheritance [see “Sources Green-eyed wingless 3 49
Used” at end of book]: scroll down to “Law of Independent Assortment (the ‘Second Law’)”. A Web search for
Red-eyed winged 3 36
“9:3:3:1” will bring up plenty more.
Red-eyed wingless 1 12
Total 16 217
Solution
(1) H0: The fruit flies follow the 9:3:3:1 model.

H1: The fruit flies do not follow the model.
(2) α = 0.05
(3–4) MATH200A/GoF Test
df=3, χ²=2.45, p=0.4838
(RC) Random sample.

10×217 = 2170, and the number of fruit flies is far greater.
All E’s are >5; the smallest is 13.563.
(6) At the 0.05 level of significance, we can’t say whether the fruit flies follow the 9:3:3:1 model or not.
Or,
We can’t say whether the fruit flies follow the 9:3:3:1 model or not (p = 0.4838).
(Again, if you don’t have the program you can follow the procedure in How to Test Goodness of Fit on TI-83/84.)
Scientific Method and Your Conclusions
While it’s true that this one experiment gave no conclusion, science wouldn’t stop there. You know
that the scientific method calls for experiments to be replicated. Now, when the experiment is Because this textbook helps you,
repeated, either H0 will be rejected or it will fail to be rejected. Here’s how those possibilities interact please donate at
with what you’ve learned about writing conclusions. BrownMath.com/donate.
If additional experiments do reject H0, then we conclude that H0 is actually false, and this
first sample just happened to be one of the unlucky ones that failed to show an effect that actually exists.
There’s one caveat, though. Experiments at the 0.05 significance level will reject H0 in about one case in twenty where it’s actually true.
So while a “reject H0” deserves a lot of respect, if it’s one result out of dozens we can’t take it on its own as enough to overthrow H0.
If additional experiments still come up with a “fail to reject H0”, we begin to think that H0 is probably true.
How can we do that on the basis of multiple experiments when we can’t do it from one experiment? Well, remember what “fail to reject
H0” means: either H0 is actually true, or it’s actually false but this experiment’s sample happened not to show it. If it was actually false, we
would expect most experiments to reject it. But as test after test fails to disprove H0, we grow more and more confident that it’s not going to
be disproved.
For this reason, in scientific contexts the conclusion after failing to reject H0 is often wri en in terms like “the data are not inconsistent with the
model” or even “we were unable to rule out the model.” The scientists are not accepting the null hypothesis here; they’re writing for a technical
audience that understands what a “fail to reject H0” means. When you’re writing for a general audience, stick to neutral language when you fail to
reject H0.
12A3. Third Problem: Equal Preferences
A store manager always has to decide how to use limited shelf space or freezer space most effectively. The store currently carries four brands of
veggie burgers, and the manager wants to know if customers have a preference. (This is the last store in America that has not computerized its
inventory.) She randomly selects a week, and finds the following sales figures: 145 Brand B, 195 Brand G, 189 Brand Q, and 153 Brand V. At the 0.05
level of significance, can you say that customers have equal or different preference for the brands?
Solution
You’re not explicitly given a model, so you have to develop one. But “equal preferences” must mean the expected counts are all equal, or in other
words the numbers in the model are all equal. You could enter ¼:¼:¼:¼, or 1:1:1:1, or any numbers as long as it’s four equal numbers.
(1) H0: Consumers have equal preference for the four brands.
H1: Consumers have unequal preference for the four brands.
Comment: Students often write these backwards. Remember that H0 is always some variation on “nothin’ special goin’ on here.” Preference
for one brand over another would be something, so that must be H1.
(2) α = 0.05
(3– L1=1,1,1,1; L2=145,195,189,153
4) MATH200A/GoF Test
df=3, χ²=11.14, p=0.0110
(RC) Random sample? The week was random, and we assume that the week’s customers are representative.
Sample size is 145+195+189+153 = 682, and 10×682 = 6820. There will be more than that number of shoppers, past, present, and future.
All E’s equal 170.5, ≥ 5.
(6) Consumers in general, at this store anyway, do have unequal preferences among the four brands (p = 0.0110).
Or,
At the 0.05 level of significance, we can say that consumers in general do have unequal preferences among the four brands.
So what should the manager do? The χ² test shows that brand preferences aren’t equal, and Brand B is clearly the loser in this sample, but is that
really enough to throw out Brand B? I wouldn’t. Its χ² contribution is below the threshold discussed in Residuals, above. And it did sell only eight
units less than Brand V; that’s just a 5% difference. Maybe in another week it might sell more.
What the manager can do, now that the finger of suspicion is pointed at Brand B, is make another study — this time maybe taking two random
weeks — and focus on just Brand B as a proportion of total veggie-burger sales. If they’re all equal, every brand would have 25%, so the manager
might want to drop Brand B if a one-proportion test shows that less than say 20% of all sales are Brand B. But again, this would need to be a new
sample, not just a 1-PropZTest on the data from this sample. You should never perform multiple significance tests on the same data.
Summary: You’ve already met two-way tables back in the chapter on probability [URL: https://BrownMath.com/swt/pfswt.htm.htm#c05_top]. Now you’ll
learn two types of inferences on those samples:
Test of independence — table of one population and two a ributes
Test of homogeneity — table of two or more populations and one a ribute
People may not always agree on whether a given situation is a test for independence or homogeneity, but that’s okay because the two
tests are identical in every way; it’s just a ma er of how you phrase your conclusions to match what you tested.
12B1. First Problem: Office Equipment
“In 1970, SCM surveyed 150 office managers in three states to see if typewriter brand preference varies
Preference IBM SCM Total
between states.” The quote and the table at right are from Dabes and Janik (1999, 274) [see “Sources Used”
at end of book]. They didn’t say, but presumably this was a random sample. They go on to ask, “Do [the] New York 35 35 70
above indicate that brand preference depends on state? ... α = 0.05.”
(A “typewriter” was a Stone Age piece of office equipment, sort of like a keyboard and printer fused Pennsylvania 25 15 40
into some sort of bizarre hybrid. Believe it or not, in the 1970s every business had several, and many homes Connecticut 30 10 40
had at least one. They were popular gifts for high-school graduation!)
Total 90 60 150
The question seems clear: Does typewriter brand preference vary among states? But be careful in your
thinking! The question is not asking whether preference varies among the managers surveyed. Obviously it does: NY has 50%–50%, PA has 63%–37%,
and CT has 75%–25%. The question is whether this sample lets us conclude that brand preference varies among all managers in the three states. The first
is descriptive statistics; this is inferential statistics.
But how to analyze it? Well, you have three populations, office managers in the three states. And you have one a ribute, preferred brand. So you
need to do a test of independence.
As always, the first step is to set up your hypotheses. Recall that H0 is always some variant of “Nothin’ goin’ on here” or “no effect”. So your
null hypothesis must be that brand preference is independent of state, and the alternative naturally is that brand preference depends on (is associated
with, varies by, is not independent of) state.
What do you do to come up with a test statistic and a p-value? As usual, the calculator is your friend. But as usual, first I’ll take you on a li le
tour so that you understand what you’re testing. ☺
The Theory
Just like goodness of fit, two-way tables are analyzed using the χ² distribution. So you are once more concerned with the differences between
observed and expected, and χ² will be the sum of
(observed − expected)² / expected
just as it was in goodness of fit. But the computation of “expected” is a bit more complicated.
Computing Expected Counts (E’s)
What is meant by “expected” for this two-way table? Well, in the overall sample IBM was preferred 60–40
Preference IBM SCM Total
over SCM: 90/150=60%, 60/150=40%. So if brand preference doesn’t vary by state — if H0 is true — you
would expect that same 60–40 split in each state. New York 35 35
Once you’ve got that, it’s just a ma er of applying the 60–40 split to each state. In New York, 60% of 70 (42) (28) 70
is 42, and 40% of 70 is 28. (Conventionally, the expected numbers are wri en in parentheses in the cell,
under the observed numbers.) Pennsylvania and Connecticut are filled in the same way in the table at Pennsylvania 25 15
right. (24) (16) 40
This table just happens to have whole numbers for all the expected counts. But it’s possible and okay Connecticut 30 10
for expected numbers to be decimals. (24) (16) 40
BTW: There’s an alternative formula for the expected numbers. You may have observed that this was a two-pass procedure: Total 90 60
first calculate the overall percentage preferences, and then apply those percentages. There exists a one-pass procedure that (60%) (40%) 150
is mathematically equivalent:
expected = (row total) × (column total) / (grand total)
For example, the expected count for NY IBM is 70×90/150 = 42, and similarly for all the others. This is a neat formula, but then you never get to see the real
point, which is that equal percentage split among all the populations.
The expected counts are how you test requirements. They’re exactly the same as for goodness of fit: random sample less than 10% of population,
with all expected counts at least 5. (Again, some authors require only that none of the E’s can be below 1, and no more than 20% of them can be below
5.)
What do you do if your E’s are too small? You can combine smaller categories or smaller populations, if the combination seems reasonable. For
example, if this was a four-row table and included Rhode Island, but the RI expected counts were too low, you could combine CT and RI since they’re
adjacent small states in coastal New England. (If you had 50 states, you wouldn’t combine Rhode Island and Wyoming, because they’re
geographically and demographically different.) But it’s best to plan ahead and not get into this position. You should make some kind of guess about
how the percentages will work out, and then plan a large enough sample in each population based on the percentages you expect.
Computing χ² and p-Value
Eyeballing the observed and expected numbers doesn’t really tell you much. You can
χ² Contributions IBM SCM
see that observed and expected are pre y close in PA but further apart in NY and CT.
But what you can’t see is whether that difference is too great to be purely the result of New York (35−42)²/42 = 1.17 (35−28)²/28 = 1.75
random sample selection. For that, you need to compute χ² in each of the six cells and
add them up. Those computations are shown in the table here. Pennsylvania (25−24)²/24 = 0.04 (15−16)²/16 = 0.06
Add up those six numbers and you have your test statistic: χ² = 6.77. Connecticut (30−24)²/24 = 1.50 (10−16)²/16 = 2.25
Degrees of freedom is a bit different from the goodness-of-fit case. You might expect
df to be 6−1=5, but for two-way tables it’s actually
df = (rows−1) × (columns−1)
You don’t count the total row and total column. For this table, df = (3−1)×(2−1) = 2.
Finally, computing χ²cdf(6.77,∞,2) gives p-value = 0.0339.
Hypothesis Test in Practice
So much for the theory. But how will you perform the test in practice? This section runs through the complete hypothesis test. There’s still some
commentary, but the stuff in boxes is what you’d actually write for a quiz or homework.
1. Hypotheses
With independence and homogeneity (two-way tables), there’s no single population parameter to test for. So you state the hypotheses in words. In a
test of independence, H0 is always independence, and H1 is always dependence.
(1) H0: Brand preference doesn’t vary by state

[or, is independent of state, is not associated with state, etc.].
H1: Brand preference varies by state
[or, is dependent on state, is not independent of state, is associated with state, etc.].
2. Significance Level
Nothing new here:
(2) α = 0.05
3–4. Test Statistic and p-Value
Here you have a choice. The MATH200A Program [URL: h ps://BrownMath.com/ti83/math200a.htm#Download] is a bit easier to use and gives more
information, but you can also use the native TI-83/84 test called χ²-Test. (In the TI-89 family, it’s Chi2 2-way.) Either way, start by pu ing the
observed numbers into a matrix, as follows:
1. If you have a [MATRX] key, press it. But you probably don’t, so press [2nd x-1 makes MATRX]. You get the matrix menu,
similar to the one shown at right.
2. Unlike the stats menu, the matrix menu doesn’t come up ready for editing. You have to press [►] [►] before
pressing [ENTER]. You’re then prompted for the numbers of rows and columns, not including the total row and
total column. As you enter the number of rows and number of columns, the matrix changes shape to match.
3. Enter the observed numbers in the matrix. As you press the [ENTER] key after entering each number, the
calculator automatically moves to the next cell.
After you fill matrix A with the observed numbers, it’s time to perform the calculations.
Press [PRGM], then the number you see for MATH200A, then [ENTER]. Press [STAT] [◄] and then press [▲] repeatedly to get to χ²-Test. Press
Dismiss the splash screen and press [7] to select the two-way test. [ENTER].
As soon as you make the selection, the program begins computing. (It Chances are good that Observed and Expected will already show [A]
assumes that you have the observed counts in matrix A; it knows how to and [B]. If not, change either or both by pressing [2nd x-1 makes MATRX]
compute the expected numbers and puts them in matrix B automatically.) (or the [MATRX] key, if you have one), and then [1] to select matrix A or
The results look like this: [2] to select matrix B.
Select Calculate and press [ENTER]. You should see these results:
You’ll notice that the program tells you that it put some information in
matrix C. Under Residuals and More, below, we’ll look at that.
(3–4) χ²-Test
χ²=6.77, df=2, p=0.0339
(3–4) MATH200A/Two-way table
df=2, χ²=6.77, p=0.0339
RC. Requirements Check
In addition to a random sample less than 10% of population, you need all the E’s to be at least 5. Don’t just say this without checking, because sooner
or later you’ll have a case where that’s not true.
The results screen (above) shows you how many categories had The calculator puts the expected counts in matrix B while doing the χ²-Test.
expected counts below 5. A piece of cake! The expected counts To view matrix B, press [2nd x-1 makes MATRX] [►] [►] [2]. Look at every value
are stored in matrix B in case you need to look at them. (you may have to scroll) for any that are below 1, or below 5 but ≥1.
(For classes that use the looser requirement, if any E’s are
below 5 then the program tells you how many are below 1.)
In this case you see that all expected counts are above 5.
(RC) Random sample.

NY 10×70 = 700, PA and CT 10×40 = 400. Surely there are many more office managers in those states.
All E’s in [B] are >5.
5. Decision Rule
This is the same for every type of hypothesis test.
6. Conclusion
(6) At the 0.05 level of significance, brand preference does vary by [or is dependent on, associated with, not independent of] state.
Or,
Brand preference does vary by [or is dependent on, associated with, not independent of] state (p = 0.0339).
Optional: Residuals and More
Just as with the goodness-of-fit test, if you reject the null hypothesis then you can look at the standardized residuals, which are
(observed − expected) / √expected
A standardized residual outside of ±2 is probably significant. The χ² contributions are the squares of the standardized residuals, so a χ² contribution
above 4 is probably significant.
The MATH200A program shows you the χ² contributions and more. (If you used the TI’s native χ²-Test
and want this information, you have to compute it for yourself.) To view the additional information,
which is in matrix C, press [2nd x-1 makes MATRX] [◄] [3]. Here’s what it looks like for the typewriter survey.
(I’ve pasted screens together to save the effort of scrolling.)
Unfortunately it’s not possible to put captions in a matrix, but here’s your guide to interpreting it. There
are three regions: the χ² contributions, the row and column totals, and the row and column percentages. The following paragraphs show and explain
those.
The χ² contributions are in the upper left corner of the matrix, with the size and shape of the original
matrix. The original matrix had three rows and two columns, so you want to look at the top left 3×2.
As I mentioned at the start of this section, if you’re able to reject H0 then a χ² contribution >4 is
probably significant. In this problem, your p-value was <α and you did reject H0, but the χ² contributions
are all well under 4. How do you interpret this? There is indeed some variation in brand preference
among states, but you can’t tell just where it is. Isn’t this kind of a paradox? Maybe, but I would say
instead that the sample was large enough to show that some effect existed, but not large enough to show the details of the effect. If you repeat the
survey with a larger sample, you might be able to learn more.
“But wait!” I hear you say. “Isn’t it obvious? NY was 50–50, PA was just over 60–40, and CT was 75–25. Isn’t it obvious that CT is very different
from NY?” Yes, it is — in the sample. But you don’t know whether it’s true in the population. For all you know, the NY sample just happened to
under-represent IBM lovers and over-represent SCM lovers, and CT the opposite, so that in another sample the proportions might be reversed. You
simply don’t have enough information to draw more detailed conclusions.
The row and column totals are in the next section, in an extra column and an extra row. This particular
problem gave them to you, but many problems do not, and you’d be surprised how hard it is to grasp a
problem and interpret a result without this information.
Here you see that the three states’ sample sizes were 70, 40, and 40; the overall preference for the two
manufacturers was 90 and 60. Of course whether you add down or across, you get the same 150 for
overall sample size.
Finally, you have the row and column percentages in the last column and last row. What is this telling
you? NY was 46.7% of the whole sample, and PA and CT were each 26.7%. IBM lovers were 60% of the
whole sample, and SCM lovers were 40%. The two 0’s are just space fillers; the 100 at the lower right
reminds you that the row percentages and column percentages add up to 100% in either direction.
Why do you care about the row and column percentages? Because they explain what the null
hypothesis means. The null hypothesis is that brand preference is the same among states. So if the null
hypothesis is true, then NY, PA, and CT all have the same 60–40 split between IBM and SCM that the overall sample has. (I figured the percentages by
hand in Computing Expected Counts (E’s) above.)
You can read the percentages in the other direction, too. It doesn’t have much use in this particular problem, but you can do it. NY was 46.7% of
the whole sample, so if H0 is true then 46.7% of the IBM lovers and 46.7% of the SCM lovers should be in the NY sample. Similarly, if H0 is true then
PA and CT should each have 26.7% of the IBM lovers and 26.7% of the SCM lovers. And if you look back at the matrix of expected counts you’ll see
that it matches. (Hey, I told you it wasn’t very useful in this situation! But there are others where it can be useful to read the table down or across.)
BTW: If your two-way table has two rows and two columns, you’re testing the proportions in two populations. That’s a case you had back in Chapter 11 [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c11_Case5]. In this situation, you can do a χ² test or a 2-proportion z test; you’ll get the same p-value from both. But if you want
to know the size of the difference between the two population proportions, you have to do a 2-proportion z interval. (There is such a thing as a confidence interval in the χ²
procedure, but it’s pre y gnarly and we don’t study it in this course.)
12B2. Second Problem: The “Monday Effect”
It’s a persistent idea that cars manufactured on Monday are of lower quality because the workers are recovering from wild weekends. But is it true?
A quality analyst randomly chose 100 records from each weekday over the past year and obtained the following results:
Mon Tue Wed Thu Fri
Defective 15 10 5 5 10
Mon Tue Wed Thu Fri
Non-defective 85 90 95 95 90
At the 0.05 significance level, are the proportions of defective cars different on different days?
Comment: The way I phrased it, this is a problem of homogeneity, five populations (Monday cars, Tuesday cars, and so on) with one a ribute
(defectiveness). But the question could just as well be asked whether likelihood of a defect depends on the day of the week when the car was
manufactured. In those terms, it’s a test of independence: one population (cars) with two a ributes (day manufactured, and defectiveness).
This is a good illustration of what I said in the summary: many situations can be treated equally well as tests of independence or tests of
homogeneity. Fortunately, the procedure is identical; you just need to state your hypotheses and your conclusions in terms of what you were asked
to test.
Solution
(1) H0: Proportions of defective cars are the same on all weekdays.
H1: Proportions of defective cars are different on different weekdays.
(2) α = 0.05
(3–4) χ²-Test or MATH200A/Two-way table
df=4, χ²=8.55, p=0.0735
(RC) Random sample.
We don’t know production figures, but we can be confident they’re far higher than 10×500 = 5000 during the year.
All E’s ≥ 5.
(6) At the 0.05 significance level, it’s impossible to say whether the proportions of defective cars are the same or different on different days.
Or,
We can’t determine whether the proportions of defective cars are the same or different on different days (p = 0.0735).
12B3. Third Problem: Tobacco Smoke and Tumors
“The cancer-producing potential of pyrobenzene (a major constituent of cigare e smoke) was tested. Eighty mice were used as a control group with
no exposure to pyrobenzene, eighty more were exposed to a low dose, … and another seventy were given a high dose.”
No tumors One tumor Two or more tumors Total
Control 74 5 1 80
Low dose 63 12 5 80
High dose 45 15 10 70
Total 182 32 16 230
Can you prove that pyrobenzene dosage affects production of tumors, at the 0.01 significance level, or “could the above apparent difference be due to
random chance?”
Source: Dabes and Janik (1999, 53–54) [see “Sources Used” at end of book]
Please stop here and write out your complete hypothesis test on paper, then check your solution against mine.
Solution
(1) H0: Pyrobenzene dosage does not affect tumor production.

H1: Pyrobenzene dosage affects tumor production.
Caution! Students sometimes write their null hypothesis as something like “random chance can account for the observed difference.” Yes, if your p-
value is high, it means that random chance could explain the observed sample, but that doesn’t mean that it is the explanation. There’s always the
possibility that dosage does affect tumors but this sample just happened not to show it. So write your H0 and H1 as contrasting statements about
tumors and dosage, as I’ve done here.
(2) α = 0.01
(3–4) χ²-Test or MATH200A/Two-way table
df=4, χ²=19.25, p=7.012755E-4 or about 0.0007
Caution! You have a 3×3 table, not a 4×4 table. Never enter total rows or total columns in your matrix. (If you got df=9, you made that mistake.)
(RC) Random sample? In a lab setup with mice, we can assume so. (Most likely the mice were genetically identical, purchased from a supply
house.)
Sample size less than 10% of population? Yes, the population of mice is unlimited (unfortunately).
All E’s ≥ 5? Look at matrix B, and you see that one of the nine expected counts is below 5 — specifically, 4.8696.
That’s just a smidge below 5, and given the very low p-value it’s not a problem.

(6) At the 0.01 significance level, we can conclude that pyrobenzene dosage level does affect production of tumors.
Or,
Pyrobenzene dosage level does affect production of tumors (p = 0.0007).
Comment: Why can we make a statement of cause here, rather than the weaker “is associated with”? Answer: This was an experiment, and the mice
were either genetically identical or randomly assigned to the three groups, or both.
Every course has to draw the line somewhere and leave out lots of interesting things, and this one is no exception. Inferential Statistics Cases [URL:
h ps://BrownMath.com/stat/cases.htm] lists quite a few cases that we don’t have time to study in this course. For those who are interested, here are
handouts for some of the cases that I had to leave out:
In this chapter, you studied only the hypothesis test for goodness of fit. It’s possible to do a confidence interval as well, but it’s tricky
because every category can vary simultaneously.
See Confidence Intervals for Goodness of Fit [URL: h ps://BrownMath.com/stat/gof_ci.htm], which includes Excel workbooks to do the
calculations.
As you know, the sampling distribution for means is the t distribution. But standard deviations of samples vary according to a χ²
distribution.
You can do Inferences about One-Population Standard Deviation [URL: h ps://BrownMath.com/stat/stdev1.htm]. That page includes an
Excel workbook for the calculations, or you can use MATH200B Program part 5.
In Chapter 10 you tested the mean of one population, and in Chapter 11 you tested the means of two populations.
It’s possible to test the means of three or more populations, and your calculator already contains the needed command. See One-Way
ANOVA [URL: h ps://BrownMath.com/stat/anova1.htm].
Back in Chapter 4, you computed the correlation coefficient of your sample and the best fi ing regression line for your sample. You also did
some primitive inference about the correlation coefficient by using a table of decision points.
You can actually do a hypothesis test and confidence interval about the correlation coefficient of a population, as explained in Inferences
about Linear Correlation [URL: h ps://BrownMath.com/stat/correl.htm]. That page includes an Excel workbook for the calculations
(including where the decision point numbers come from), or you could use MATH200B Program part 6.
The slope and intercept you computed for your regression line are actually random variables. If you had a different sample, you would
have come up with a different regression line. For hypothesis test and confidence interval about the regression line, see Inferences about
Linear Regression [URL: h ps://BrownMath.com/stat/infregr.htm]. There’s an Excel workbook for calculations, or you can use MATH200B
Program part 7.

With non-numeric data involving three or more responses to a variable, multiple populations, or both, each bucket contains some
number of data points. The expected count for each bucket is the number you would expect to see in that bucket if H0 is true.
(Although actual counts are whole numbers, the expected counts typically aren’t.)
The observed counts are usually different from the expected counts. The χ² statistic measures how different, as one number for
the whole model. The p-value says how likely this difference (or a greater difference) is if H0 is true.
When you have one population with one categorical variable, you test goodness of fit to a model, Case 6.
H0 is that the model is consistent with the data, and H1 is that it’s not consistent. Use MATH200A Program part 6.
When you have one population with two categorical variables, you test independence of the two a ributes. When you have two or
more populations with one categorical variable, you test homogeneity. The line between independence and homogeneity is very
blurry, but fortunately both are computed exactly the same way, in Case 7.
H0 is that the variables are independent or the population proportions are all equal; H1 is that the variables are not independent
or some population proportions are different from others. Use MATH200A Program part 7 or χ²-Test.
Requirements for both χ² tests are a random sample, sample size less than 10% of population, and all expected counts at least 5.
These cases have only hypothesis tests. Although confidence intervals exist, we don’t study them in this course.

do it after all.
Explain the difference between a model of 25% to 40% to 35% and a model of 25 to 40 to 35.
1
You think that people’s ice cream favorites are 25% each vanilla and chocolate, 20% strawberry, 15% bu er pecan, 8% rocky road, and 7% other
2 or no preference. You randomly survey 1000 people and find preferences are 220 vanilla, 255 chocolate. 190 strawberry, 170 bu er pecan, 95
rocky road, and 70 other or no preference. Using α=0.05, was your idea right or wrong?
Democrats and Republicans (random sample) were surveyed for their opinions on gun
3 control, and the results are shown in the table at right. Based on this sample, does a person’s
favor oppose unsure total
opinion on gun control depend on party affiliation, at the .05 level of significance? Democrat 440 400 120 960
Republican 320 480 100 900
total 760 880 220 1860

In a random sample, 425 first graders were surveyed about what they want to be when they grow up, out of a choice of five professions. The
4 results were Teacher 80, Doctor 105, Lawyer 70, Police officer 70, Firefighter 100.
Obviously these particular children preferred some occupations over others. Test whether their preferences reflect a real difference among all
first graders, at the 0.05 significance level.
A random sample of women were observed for their consumption of eggs and their age at menarche:
5
Egg Consumption
Age at Menarche Never Once a week 2–4 times a week
Low 5 13 8
Medium 4 20 14
High 11 18 15
Data were adapted from Kuzma and Bohnenblust (2005, 224) [see “Sources Used” at end of book].
Test, at the 0.01 level, whether age at menarche is independent of egg consumption.
In Jury Selection, George Michailides (n.d.a) [see “Sources Used” at end of book] quotes a study of the age
6 breakdown of grand jurors in Alameda County, California, in 1973.
Age County Jurors
It’s pre y obvious that this sample doesn’t match the age distribution of the county, but is the discrepancy too great 21–40 42 % 5
to be random chance? Choose an appropriate significance level.
(This isn’t a random sample, but that’s okay because you’re not asked to generalize to a population. For the same 41–50 23 % 9
reason, the “≤10% of population” requirement doesn’t apply.) 51–60 16 % 19
≥ 61 19 % 33
Total 100 % 66
Do men choose the size town to live in based on the size town they grew up in? 500 men (a simple random sample) were surveyed. Use α = 0.05.
7
Now residing in
Raised in < 10,000 10,000–49,999 ≥ 50,000 Total
< 10,000 24 45 45 114
10,000–49,999 18 64 70 152
≥ 50,000 21 54 159 234
Total 63 163 274 500
Is Echinacea effective against the common cold? The New England Journal of Medicine
8 (Turner 2005 [see “Sources Used” at end of book]) reported a study on 437 volunteers who Day −7 to 0 Day 0 to 5 Cold
No
Cold
Totals
were randomly assigned to receive a CO2 extract of Echinacea, a 60% alcohol extract, a 20%
alcohol extract, or a placebo for seven days before and 5 days after exposure to rhinovirus type CO2 extract CO2 extract 40 5 45
39.
Some withdrew or were excluded from the study for another reason. The final results are 60% extract 60% extract 42 10 52
shown at right. The common cold is annoying but rarely fatal, so use α = 0.01 to determine 20% extract 20% extract 48 4 52
whether Echinacea makes a difference in the likelihood of catching a cold.
Placebo CO2 extract 43 5 48
Placebo 60% extract 44 4 48
Placebo 20% extract 44 7 51
Placebo Placebo 88 15 103
Totals 349 50 399
Review
Updated 9 Dec 2015
Summary: Even if you’ve been doing all the work and keeping up with the course, the mass of material you need to know for the exam can be
overwhelming. This page helps you identify what’s most important in preparing for the exam. (If you’re an independent learner, it
points you to the most important things you should have learned from your study.)
Contents: What’s Important?

Do This for Every Chapter
Finish with Overall Course Review
Review Problems
Problem Set 1: Short Answers
Problem Set 2: Calculations
What’s Important?
Here are your guidelines for reviewing the subject ma er.
Do This for Every Chapter
Read the Summary at the beginning, when there is one.

Notice when a section is marked optional. When you’re doing your final studying, spend your time on the core concepts, not the optional
extras.
Scroll through the chapter and look at the definitions. Do you understand the meaning of each term and how to use it?
Scroll through the chapter again, and this time look at section heads and key concepts, which are marked in bold. Make notes for your cheat
sheet of anything important that you think you might forget.
Pay a ention also to calculator procedures and formulas. Know when and how to carry out each calculator procedure, and when and
how to use the very few formulas that aren’t built into your calculator or the MATH200A program [URL:
h ps://BrownMath.com/ti83/math200a.htm]. There’s at least one example for each calculator procedure and each formula, so work through
it if you need to refresh your memory. Again, make notes for your cheat sheet of anything you’re likely to forget.
The “What Have You Learned?” section at the end of the chapter lists the most important concepts. (Links to “What Have You Learned?”
are in the online version of this document [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c99_LearnedLinks].)
If you’ve actually learned everything listed there, you should be in good shape for the exam. If you haven’t, review that section of the
text and work the examples.
Glance over the chapter exercises. If you had trouble with any of them before, make sure you thoroughly understand it now.
Finish with Overall Course Review
So much for the trees. Now it’s time to think “forest”.

Go through the cheat sheets you just made, and boil them down to one sheet, front and back. Making a one-sheet cheat sheet is always
useful, even if the exam will be open book or if the instructor doesn’t allow any notes at all. Writing your summary of the course helps you
make sense of the course material, to see it as a whole instead of an unrelated jumble of facts to memorize.
Practice with the review problems below. If you can’t work a problem, go back and learn what you’re missing. If something is missing from
your cheat sheet, add it.
Get a good night’s rest the night before the exam. Sleep deprivation makes people make stupid mistakes, so protect yourself from that.

Histogram Versus Bar Graph Because this textbook helps you,
Inferential Statistics: Basic Cases please donate at
Interactive: Triage: Which Inferential Stats Case Should I Use? BrownMath.com/donate.
Top 10 Mistakes of Hypothesis Tests
Review Problems
Here are practice problems to help you test your knowledge and prepare for the final exam. Solutions are provided [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#hwa_hwrk99_root], but make a genuine effort to work any given problem on your own before you turn
to the solution.
How to use: Don’t necessarily make it your goal to work every problem. But do at least look at every one and make sure that you can set it up
correctly. Your success on the final exam hinges on your ability to identify which type of problem you are facing.
Don’t panic! This problem set is much longer than the exam will be, and some problems are harder than the problems you will meet on the exam.
Problem Set 1: Short Answers
Write your answer to each question. There’s no work to be shown. Don’t bother with a complete sentence if you can answer with a word, number, or
phrase.
Two events A and B are disjoint. Is it possible for those same events to be independent as well? Give an example, or explain why it’s impossible.
1
Yummo candy bars are supposed to have an average weight of 87.5 grams (about three ounces). To test this, a team of students bought one
2 Yummo bar from each of the six stores in the village of Carlyle and weighed it.
(a) The data would best be analyzed as an example of

A. one population proportion
B. two populations, difference in proportions
C. one population mean
D. two populations, difference in means, paired data
E. two populations, difference in means, unpaired data
F. goodness of fit
G. contingency table
(b) Which two tests must you perform on your sample data before doing the analysis mentioned above? (In other words, how would you make sure
that the sample meets the requirements?)
The two main types of data are qualitative and quantitative. What other names can you give for each? Give an example of each.
3
The probability of rolling a 6 on an honest die is 1/6. If you roll an honest die ten times and none of the rolls comes up 6, is the probability of
4 rolling a 6 on the next roll less than 1/6, equal to 1/6, or greater than 1/6? Explain why.
In a large elementary school, you select two age-matched groups of students. Group 1 follows the normal schedule. Group 2 (with parents’
5 permission) spends 30 minutes a day learning to play a musical instrument. You want to show that learning a musical instrument makes a
student less likely to get into trouble. You consider a student in trouble if s/he was sent to the principal’s office at any time during the year.
(a) Write your hypotheses, in symbols.
(b) Identify either the case number or the specific TI-83 test you would use.
Imagine rolling five standard dice. You compute the probability of rolling no 3s, one 3, and so on up to five 3s. Is this a binomial probability
6 distribution? With reference to the definition of a binomial PD, why or why not?
Over the course of many statistical experiments, which one of these values for the significance level would enable you to prove the most results?
7 A. 5% B. 1% C. 0.1% D. Significance level has no effect on how likely you are to prove a hypothesis.
A key step in hypothesis testing is computing a p-value and comparing it to your preselected α. After you do that, which of the following
8 conclusions would be possible, depending on the specific values of p and α? (Write the le er of each correct answer; there may be more than
one.)
A. Accept H0, reject H1
B. Reject H0, accept H1
C. Fail to accept H0, no conclusion
D. Fail to reject H0, no conclusion
Distinguish disjoint events, mutually exclusive events, and complementary events. Give an example of each.
9
When is a histogram an appropriate graphical method of presentation?
10
For what type of events does P(A or B) = P(A) + P(B)? Give an example.
11
In a χ² goodness-of-fit test, which of the following is/are true?
12 (A question with this many technical alternatives should not be on an exam. Just use it to test your own understanding of χ².)
A. The hypotheses are stated in words rather than relating some population parameter to a number.
B. The null hypothesis is always some variation on “the observed sample matches the model.”
C. The alternative hypothesis is always some variation on “our model is good.”
D. Instead of a p-value, we compare the value of χ² to α to draw a conclusion.
E. Degrees of freedom equals the number of cells in our model.
F. If the difference between our observed results and our expected results could likely have occurred by random chance, we reject the null hypothesis.
What are the two types of numeric data called? Explain the difference, and give an example of each.
13
Suppose the null hypothesis is that a machine is producing the allowed 1% proportion of defectives (H0: p = 0.01). Your experiment could end
14 in one of several conclusions, depending on your sample data. List the le ers of all possible conclusions from those below. (The actual
conclusion would depend on the choice of H1, the choice of α, and the calculated p-value. Not all possible conclusions are listed below.)
A. The machine is producing exactly the acceptable proportion of defectives.
B. The machine is producing no more defectives than acceptable.
C. The machine is producing too many defectives.
D. Unable to prove anything either way.
How can you avoid making a Type I error in a hypothesis test?

15
You want to find what proportion of churchgoers believe that evolution should be taught in public schools, so you take a systematic survey at
16 a local mall. You collect 487 survey forms. Of those, 321 identify as churchgoers, and 227 of those 321 say that evolution should be taught in
public schools.
(a) What is the population?
(b) What is the population size?
(c) What is the sample size?
(d) Is limiting the sample to churchgoers a bias source?
You’re doing a hypothesis test to try to show that Drug A is more effective than Drug B, and your p-value is 0.0678. Your roommate, who has
17 not taken statistics, asks, “So there’s a 6.78% chance that the drugs are equally effective, right?” Explain what the p-value actually means.
Eight percent of the 2×4s from a lumber yard have cracks longer than an inch. Assume that the defectives are randomly distributed. Do you
18 use a binomial or a geometric distribution to compute each of the following, and why? (You don’t actually need to compute the probabilities;
just identify the distributions.)
(a) The probability that no more than five 2×4s in a random sample of 100 have cracks longer than an inch.
(b) The probability that exactly five 2×4s in a random sample of 100 have cracks longer than an inch.
(c) The probability that, pulling 2×4s at random, the first four don’t have cracks longer than an inch but the fifth one does.
Data are gathered and a computation is done to answer the question “As near as we can tell, how much does the average high-school student
19 spend on lunch?” This computation would be part of
A. hypothesis test
B. sample size
C. confidence interval
D. none of the above
Linear correlation coefficients must lie between what two values? What value indicates “no linear correlation”? Does this mean no correlation
20 at all?
“Four out of five dentists surveyed recommend Trident sugarless gum for their patients who chew gum.” Which of these is the correct symbol
21 for “four out of five dentists surveyed”?
µ π σ p p̂ po x x̅ s
A poll concludes that 26.9% of TC3 students are satisfied with the food service. What is the type of the original data gathered?
22
For what sort of data might you use a pie chart? Why?
23
The mean is usually the best measure of center of numerical data. But under certain circumstances the mean is not representative and you
24 prefer a different measure of center. Which circumstances, and which measure of center?
Usually you make what you want to prove the alternative hypothesis, not the null hypothesis. Why?
25
A company wishes to claim, “People who eat our shredded wheat for breakfast every day for a month lose more than ten points on their
26 cholesterol.” One or more of the following state the null and alternative hypotheses correctly. Which one(s)?
A. H0 > 10 H1 ≤ 10 E. H0 = 10 H1 > 10 I. H0 ≤ 10 H1 > 10
B. H0: x̅ > 10 H1: x̅ ≤ 10 F. H0: x̅ = 10 H1: x̅ > 10 J. H0: x̅ ≤ 10 H1: x̅ > 10
C. H0: µ > 10 H1: µ ≤ 10 G. H0: µ = 10 H1: µ > 10 K. H0: µ ≤ 10 H1: µ > 10
D. H : x > 10 H : x ≤ 10 H. H0: x = 10 H1: x > 10 L. H : x ≤ 10 H : x > 10
0 1 0 1
Which of the following is a Type I error?
27 A. failing to reject the null hypothesis when it is true
B. failing to reject the null hypothesis when it is false
C. rejecting the null hypothesis when it is true
D. rejecting the null hypothesis when it is false
Compare an experiment and an observational study.

28
Our symbol for level of confidence in a confidence interval is
29 α α/2 1–α z(α/2) E
(If none of these, supply the correct symbol.)
You gather a random sample of selling prices of 2006 Honda Civics. Which selection on your TI-83 would be used to test the claim “In the US,
30 2006 Honda Civics sell, on average, for more than $2,000”?
A. Z-Test B. T-Test C. 1-PropZTest D. 1-PropTTest E. χ²-Test F. none of these
Compare descriptive and inferential statistics, and give an example of each.

31
You find that your maximum error of estimate (margin of error) is ±3.3 at a confidence level of 95%. At 90% confidence, what would be the
32 maximum error of estimate?
A. more than 3.3 B. 3.3 C. less than 3.3 D. can’t say without more information.
Compare “sample” and “population”; give an example.

33
You take a random sample of Lamborghini owners and a random sample of Subaru owners. Which selection on your TI-83 would be used to
34 answer the question “How much more do Lamborghini owners spend per year on maintenance than Subaru owners?”
A. ZInterval B. TInterval C. 2-SampZInt D. 2-SampTInt E. 2-PropZInt F. none of these
You believe that more than 25% of high-school students experienced strong peer pressure to have sex. To test this belief, you survey 500
35 randomly selected graduating seniors nationwide and find that 150 of them say that they did feel such pressure.
(a) The data would best be analyzed as an example of

A. one population proportion
B. two populations, difference in proportions
C. one population mean
D. two populations, difference in means, paired data
E. two populations, difference in means, unpaired data
F. goodness of fit
G. contingency table
(b) Which tests must you perform on your sample data before doing the analysis mentioned above? (In other words, how would you make sure that
the sample meets the requirements?)
Show your work for all problems. Round probabilities to four decimal places and test statistics (t, z, χ²) to two. For hypothesis tests, check
requirements and show all six numbered steps.
You are testing the assertion, “Judge Judy is more friendly to plaintiffs than Judge Wapner was.” Since it would be tedious to tabulate the
36 hundreds or thousands of decisions each judge has handed down, you randomly select 32 of each judge’s decisions. Judge Judy’s average
award to plaintiffs was $650 (standard deviation = $250) and Judge Wapner’s was $580 (standard deviation = $260). Assume that the amounts are
normally distributed without outliers. Using a significance level of 0.05, can you conclude that Judge Judy does indeed give higher awards on
average?
Weights of frozen turkeys at one large market were normally distributed with a mean of 14.8 pounds and a standard deviation of 2.1 pounds.
37 If there were 10,000 turkeys in the market, how many choices would a shopper have who wanted a bird 20.5 pounds or larger? (Hint: begin by
figuring the percentage or proportion of turkeys in that weight range.)
(from Johnson and Kuby 2003 [see “Sources Used” at end of book], problem 9.26) “The addition of a new accelerator is claimed to decrease
38 the drying time of latex paint by more than 4%. Several test samples were conducted with the following percentage decrease in drying time:
“5.2 6.4 3.8 6.3 4.1 2.8 3.2 4.7
“If we assume that the percentage decrease in drying time is normally distributed”
(a) Test the claim, at the .05 level.
(b) “Find the 95% confidence interval for the true mean decrease in the drying time based on this sample.”
28% of a certain breed of rabbits are born with long hair. Assume that the distribution is random, and consider a li er of five rabbits.
39
(a) What is the probability that none of the rabbits in the li er have long hair?
(b) What is the probability that one or more in a li er have long hair?
(c) What is the probability that four or five of them have long hair?
(d) What is the average number (mean) of long-haired rabbits you expect in a li er of five?
A survey asked a number of professionals, “Which of the following is your most common choice for breakfast?” Using the following data
40 from a random survey, determine whether doctors choose breakfasts in different proportions from other self-employed professionals, to a .05
Cereal Pastry Eggs Other No bfst Total

Doctors 85 22 47 60 17 231
Others 185 90 160 135 35 605
Total 270 112 207 195 52 836
Suppose that the mean adult male height is 5′10″ (70″) and the standard deviation is 2.4″.
41 (a) If a particular man’s z-score is −1.2, what is his actual height to the nearest 0.1″?
(b) Using the Empirical Rule, what percentile is a height of 67.6″?
(c) By the Empirical Rule, what proportion of adult men are shorter than 74.8″?
The length of life of a random sample of incandescent light bulbs was obtained, and the results are in the table at right.
42 (a) Plot a histogram of the data.
life, hr count
(b) What is the size of the sample, with its proper symbol? 500–650 6
(c) What are the mean and standard deviation? (Use the proper symbols and round to one decimal place.)
(d) What is the relative frequency of the 1100–1250 class? 650–800 18
800–950 60
950–1100 89
1100–1250 29
1250–1400 17
One way to set speed limits is to observe a random sample of drivers and set the speed limit at the 85th percentile. What speed corresponds to
43 that 85th percentile, assuming drivers’ speeds are normally distributed with µ = 57.6 and σ = 5.2 mph?
You’re planning a survey to see what fraction of people who live in Virgil would take the bus if the county added a route between Greek Peak
44 and downtown Cortland via routes 392 and 215.
(a) You think the answer is only about 20% of them. If you need 90% confidence in an answer to within ±4%, how many people will you need to
survey?
(b) What if you have no idea of the answer? How many would you need to survey then?
Some popular fast-food items were compared for calories and fat, and the results are shown below:
45
Calories (x) 270 420 210 450 130 310 290 450 446 640 233
Fat (y) 9 20 10 22 6 25 7 20 20 38 11
(a) Make a sca erplot on your TI-83. Do you expect a positive, negative, or zero correlation? Why?
(b) Find the correlation coefficient and the equation of the line of best fit and write them down. Round to four decimal places and use proper
symbols.
(c) Give the value of the y intercept and interpret its meaning.
(d) Using the regression equation or your TI-83 graph, how many grams of fat would you predict for an item of 310 calories? Explain why this is
different from the actual data point (310 calories, 25 grams).
(e) What is the value of the residual for the data point (310,25)?
(f) What is the value of the coefficient of determination in this regression? What does it mean?
(g) The decision point for n = 11 is 0.602. What if anything can you say about the correlation for all fast foods?
Aluminum plates produced by a company are normally distributed with a mean thickness of 2.0 mm and a standard deviation of 0.1 mm. If
46 6% of the plates are too thick, what is the cutoff point between “too thick” and “acceptable?”
Many people took a physical fitness course. Seven of them were randomly selected and were tested for how many sit-ups they could do. The
47 same seven were re-tested after the course. From the data below, can you conclude that improvement took place among the general run of
people who took the course? Use α = 0.01.
Anne Bill Chance Deb Ed Frank Grace

Before 29 22 25 29 26 24 31
After 30 26 25 35 33 36 32
Your average morning commute time is 27 minutes, with SD 4 minutes. Your morning commute times are ND.
48 (a)How likely is a morning commute under 24 minutes?
(b)You pick a week (five mornings) at random. How likely is an average commute time under 24 minutes?
(adapted from Johnson and Kuby 2003 [see “Sources Used” at end of book] problem 11.15) A
49 survey was taken nationally to see what size vacation home people preferred. A separate
Unit size Entire US Nebraska
survey was taken in Nebraska. Both were random samples. Do the Nebraska results differ Studio/efficiency 18.2% 75
significantly (0.05 level) from the national results?
1 bedroom 18.2% 60
2 bedrooms 40.4% 105
3 bedrooms 18.2% 45
Over 3 bedrooms 5.0% 15
Total 100.0% 300

An experiment was designed to test the effectiveness of a short course that teaches diabetic self-care. Fifty diabetic patients were enrolled in
50 the course, and fifty others served as a control group. (Patients were randomly assigned between the two groups.) Six months after the course,
blood sugar levels were tested and results obtained as follows:
Diabetic course group: mean = 6.5, standard deviation = 0.7
Control group: mean = 7.1, standard deviation = 0.9
(a)At a significance level of 0.01, does the diabetic course succeed in lowering patients’ blood sugar?
(b) Obviously diabetic patients are not all the same. In this experiment, the largish sample sizes and randomization mean that confounding variables
are probably balanced out in the two groups.
But suppose you had money only for a smaller study, with a total of 30 patients. Suggest an experimental design that would control for most
lurking variables. What problem can you see with that design?
(adapted from Johnson and Kuby 2003 [see “Sources Used” at end of book] problem 9.36) “A study in the journal PAIN, October 1994,
51 reported on six patients with chronic myofascial pain syndrome. The mean duration of pain had been 3.0 years for the 6 patients and the
standard deviation had been 0.5 year. Test the hypothesis that the mean pain duration of all patients who might have been selected for this study
[meaning, of all persons who suffer from this condition] was greater than 2.5 years.” Use α = 0.05. Assume that the sample is a random sample,
normally distributed with no outliers.
In a survey of working parents, 200 men and 200 women were randomly selected and asked, “Have you refused a promotion because it
52 would mean less time with your family?” Of the men, 60 said yes; 48 of the women said yes.
(a) Obviously more men in the sample refused promotions. But can you conclude at the 0.05 significance level that a higher percentage of all working
men have refused promotions, versus the percentage of all working women?
(b) In an English sentence, state a 95% confidence interval for the difference in percentages of men and women who refuse promotions.
Ten thousand students take a test, and their scores are normally distributed. If the middle 95% of them score between 70 and 130, what are the
53 mean and standard deviation?
An insurance company advertises that 75% of its claims are se led within two months of being filed. The state insurance commission thinks
54 the percentage is less than 75, and sets out to prove it. First a small study is done. For this preliminary study, the commissioner can live with a
5% chance of making a Type I error. The commission staff randomly selects 65 claims, and finds out that 40 were se led within two months. Based on
this study, can you say that less than 75% of claims are se led within two months?
Work this problem only if you studied the optional extras in the Probability chapter.
55 A shoe store gets its shoes from just two companies, 40% from A and 60% from B. 2.5% of pairs from Brand A are mislabeled, and 1.5% of
pairs from Brand B are mislabeled. Find the probability that a randomly selected pair of shoes in the store is mislabeled.
Ten randomly selected men compared two brands of razors. Each man shaved one side of his face with brand A and the other side with brand
56 B. (They flipped coins to decide which razor to use on which side.) Each tester assigned a “smoothness score” of 1 to 10 to each side after
shaving. The scores are as shown below. Determine whether there is a difference in smoothness performance between the two razors, using α = 0.10.
Man 1 2 3 4 5 6 7 8 9 10
A score 7 8 3 5 4 4 9 8 7 4
B score 5 6 3 4 6 5 6 7 3 4
In August 2009, the National Geographic News Web site reported that 90% of US currency was tainted with cocaine.
57 (a) If you drew a random sample of two bills, what is the chance that exactly one of them is tainted with cocaine?
(b) You have ten bills, and you’ve been told that 90% of these ten bills are tainted with cocaine. If you draw two of the ten bills at random, what is the
chance that exactly one of your two is tainted with cocaine?
Fifteen farms were randomly selected from a large agricultural region. Each farm’s yield of wheat per acre was measured. For the 15 farms,
58 the mean yield per acre was 85.5 bushels and the standard deviation was 10.0 bushels. Find a 90% confidence interval for the mean yield per
acre for all farms in this region, assuming yield per acre is normally distributed and there were no outliers in the sample.
You draw five cards from a deck, without replacement, and record the number of aces you drew. Then you replace the five cards and shuffle
59 the deck thoroughly. If you repeat this experiment many times, is the number of aces in five cards drawn a binomial distribution? Why or
why not?
In a survey of 300 people from Tompkins County, 128 of them preferred to rent or stream a movie on Saturday night rather than watch
60 broadcast or cable TV. In Cortland County, 135 of 400 people surveyed preferred a movie. You’re interested in the difference of proportion in
movie renters for Tompkins County over Cortland County. Both surveys were random samples.
(a) What is the point estimate for that difference?
(b) Find the 98% confidence interval for the difference in the two proportions for all residents of the counties.
(c) What is the maximum error of estimate, at the 98% confidence level?
Two batches of seeds were randomly drawn from the same lot, and one batch was given a special
61 treatment. Consider the data for germination shown at right. At significance level 0.05, does the
Germinated Didn’t
treatment make any difference in how likely seeds are to germinate? Untreated 80 20
Now check yourself on the solutions page [URL: Treated 135 15

h ps://BrownMath.com/swt/pfswt.htm.htm#hwa_hwrk99_root].
Solutions to All Exercises

Updated 17 Nov 2020

please donate at
Sampling error [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c01_ErrorsSampling] is another name for sample variability, the fact that each
1 sample is different from the next because no sample perfectly represents the population it was drawn from. Nonsampling errors [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c01_ErrorsNonsampling] are problems in se ing up or carrying out the data collection, such as poorly
worded survey questions and failure to randomize.
Nothing can eliminate sampling error, but you can reduce it by increasing your sample size. (Most nonsampling errors can be avoided by proper
experimental design and technique.)
(a) systematic sample .

2 (b) It is probably a good sample of that gynecologist’s patients, since there’s no reason to think that one month is different from another. But it’s a
bad sample of pregnant women in general, because it suffers from selection bias. This gynecologist’s patients may use prenatal vitamins differently
from pregnant women who see other gynecologists or who don’t have a regular gynecologist.
(c) observational study
(a) completely randomized

3 (b) the plant food administered
(c) no food, Gro-Mor, Magi-Grow
(d) 13 heights at the end of the 13 weeks (You could also make a case for growth rate .)
(e) the 150 bulbs
(f) selection of plant food
(g) the group that gets no plant food
Each family answered the question “How many children do you have?”
4 (a) The variable is number of children .
(b) It is a discrete variable .
(c) It summarizes population data, and therefore it is a parameter .
Although “numeric” or “quantitative” is correct, it’s not an adequate answer because it is not as specific as possible. Discrete and continuous
data are treated differently in descriptive statistics, so it ma ers which type you have.
Students are sometimes fooled by the decimal. Always ask yourself what was the original question asked or the original measurement taken from
each member of the sample.
(a) The sample is the 80 people in your focus group. (It is not the drinks. It’s also not the people’s preferences: Their preferences are the data or
5 sample data.)
(b) The sample size is 80, because that’s the number of people you took data from. It’s not 55: That’s just the number who gave one particular
response.
(c) The population is not stated explicitly, but you can infer that it’s cola drinkers in general, or Whoopsie Cola drinkers in general.
(d) You don’t know how many cola drinkers (or Whoopsie Cola drinkers) there are. You can’t know, since people change their soft-drink habits all the
time. You can say that the population is indefinitely large, or you can say that it’s infinite [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c01_DefSampSize]. (You can say that the population is uncountable, but don’t say that the population size
is uncountable.)
Common mistake: Students sometimes answer “80” for population size, but this is not correct. You took data from 80 people, so those 80 people are
your sample and 80 is your sample size.
(a) sampling error (or sample variability) (b) increase sample size
6
You’re asking people to admit to socially disapproved behavior . People tend to shade their answers toward socially acceptable behavior.
7 What can be done to reduce response bias? Interviewers should be trained to be absolutely neutral in voice and facial expression, which is
how the Kinsey team gathered data on sexual behavior. Or the question can be asked on a wri en questionnaire, so that the subject isn’t looking
another person in the face when answering. The question can also be made less threatening: “Have you ever left an infant alone in the house, even for
just a minute?”
Random sample: get a list of the resident students. On your calculator, do randInt(1,2000) 50 times, not counting duplicates, and
8 interview the students who came up in those positions.
Systematic sample: You can’t station yourself in the cafeteria because that would exclude all students who don’t use it. Instead, station
yourself at the main entrance to the dorm complex (or station yourself and confederates at the main entrance to each dorm) and interview
every 20th person. Why k=20 and not 2000/50 = 40? Because whenever you’re there, you’re bound to miss a sizable proportion of students.
To select the first person to survey, use randInt(1,20). Remember that a systematic survey begins with a randomly selected person
from 1 to k, not 1 to 50 (sample size) or 1 to 2000 (population size).
Notice that I didn’t suggest a time frame. What do you think would be a good time to do this?
An alternative procedure might be to walk through the dorms (assuming you can get in) and interview the students in every 20th room.
You may get be er coverage that way than if you wait for them to come to you.
Cluster sample: Randomly select 25 rooms, and interview both of the students in those rooms. (This is a single-stage cluster.)
Best balance? Probably the cluster sample. The true random sample is a lot of work for a sample of 50, because after selecting the names you have to
track the students down. The systematic sample, no ma er how you do it, is going to miss a lot of students, and you have that time-period problem.
With the cluster sample, you can time it for when students are likely to be home, and you can go back to follow up on those you missed.
But nothing is perfect, in this life where we are born to trouble as the sparks fly upward. The cluster sample works if the students were randomly
assigned to rooms. When students pick their own roommates, they tend to pick people with similar a itudes, interests, and activities. That means
those two are more similar to each other than other students, and there’s no way you can treat that cluster sample as a random sample. The cluster
would probably be safe for freshman, where the great majority would be randomly assigned, but less so for students in later years.
No, you can’t reach that conclusion, because you can never conclude causation from an observational study. You would have to do an
9 experiment, where people were randomly assigned to watch Fox News or to watch no news at all, and then see if there was a difference in how
much they knew about the world.
Students often answer questions like this with hand-waving arguments, either coming up with reasons why it’s a plausible conclusion or
coming up with reasons why it isn’t. This is statistics, and we have to follow the facts. Whatever you may think about Fox News, the fact is that
observational studies can’t prove causation.
(a) It excludes people who don’t use the bus . This means that people who are dissatisfied with the bus are systematically under-represented.
10 Your survey will probably show that willingness to pay is higher than it actually is.
(b) sampling bias
“Random” doesn’t mean unplanned; it takes planning. This is a bogus sample . If you want a more formal statistical word, call it a
11 convenience sample , an opportunity sample or a non-probability sample.
(a) This is a ribute data or qualitative data or non-numeric data . Don’t be fooled by the number 42: the original question asked was “Do
12 you have at least one streaming device?” and that’s a yes/no question.
Alternative: the more specific answer binomial data , which you may have heard in the lecture though it’s not in the book till Chapter 6.
(b) This is descriptive statistics because it’s reporting data actually measured: 42% of the sample. If it said “42% of Americans”, then it would be
inferential because you know not every American was asked, so the investigators must have extrapolated from a sample to the population.
(c) It is a statistic because it is a number that summarizes data from a sample.
The first people who present themselves are chosen. You should randomly select from among all volunteers. (Be er still would be to
13 randomly select from among all patients, and ask the selected individuals to volunteer.)
Participants are not randomly assigned to control and experimental groups. This is always bad, but it’s especially bad when you accept a
block of volunteers in order.
The experiment is not double blind , only single blind. When doctors know who is ge ing a placebo and who is ge ing medicine, they may
treat the two groups differently, consciously or unconsciously.
All of these are nonsampling errors .
2.145E-4 is 0.0002145, and 0.0004 is larger than that.

14
It’s spurious precision. (That much precision could be appropriate if you had surveyed a few hundred thousand households.)
15 To fix it, round to one decimal place: 1.9. (Don’t make the common mistake of “rounding” to 1.8.)
(a) Non-numeric. (It has the form of a number, but think about the average area code in a group and you’ll realize an area code is not a number.)
16 (b) Continuous.
(c) Discrete.
(d) Non-numeric.
(e) Non-numeric.
(f) Discrete. (or continuous if you allow answers like 6.3)
(a) was done for you.

17 (b) Measurement: Amount of each dinner check. Continuous.
(c) Question: “Did you experience bloating and stomach pain?” Non-numeric.
(d) Measurement: Number of people in each party. Discrete.

please donate at
There’s no scale to interpret the quantities. And if one fruit in each row is supposed to represent a given quantity, then banana and apple have
2 the same frequency, yet banana looks like its frequency is much greater.
90% of 15 is 13.5, 80% is 12, 70% is 10.5, and 60% is 9.

3 Score Grade Tallies Frequency
13.5–15 A || 2
12–13.4 B | 1
10.5–11.9 C |||| 5
9–10.4 D ||| 3
0–8.9 F |||| 4
Alternatives: Instead of a title below the category axis, you could have a title above the graph. You could order the grades from worst to best (F
through A) instead of alphabetically as I did here. And you could list the class boundaries as 13.5–15, 12–13.5, 10.5–12, and so on, with the
understanding that a score of 12 goes into the 12–13.5 class, not the 10.5–12 class. (Data points “on the cusp” always go into the higher class.)
(a) The variable is discrete , “number of deaths in a corps in a given year”.
4 (b)
Alternatives: Some authors would draw a histogram (bars touching) or even a pie chart. Those are okay but not the best choice.
5
Commuting Distance
0 | 5 9 8 1
1 | 5 2 2 1 9 6 2 8 7 6 5 7
2 | 3 2 6 1 6 4 0
3 | 1
4 | 5
Key: 2 | 3 = 23 km
Relative frequency is f/n. f = 25, and n = 35+10+25+45+20 = 135. Dividing 25/135 gives 0.185185... ≈ 0.19 or 19%
6
(a) Bar graph, histogram, stemplot. A bar graph or histogram can be used for any ungrouped discrete data. (Some authors use one, some use
7 the other. I like the bar graph for ungrouped discrete data.) A stemplot, or stem-and-leaf diagram, can be used when you have a moderate data
range without too many data points.
(b) Histogram.
(c) Bar graph, pie chart.
skewed right
8
(a) Group the data when you have a lot of different values .
9 (b) The classes must all be the same width , and there must be no gaps .
(a) See the histogram at right. Important features:

10 The bars are labeled at their edges, not their centers, because this is a grouped histogram.
Both axes are titled.
The horizontal axis has a real-world title. (Sometimes you also need an overall title for the graph, but
here the axis title says all that needs to be said.)
(b) 480.0−470.0 = 10.0 or just plain “10”.

Don’t make the common mistake of subtracting 479.9−470.0. Subtract consecutive lower bounds, always.
(c) skewed left

please donate at
When the data set is skewed, the median is be er. Outliers tend to skew a data set, so usually the median is a be er choice when you have
1 outliers.
15% of people have cholesterol equal to or less than yours, so yours is on the low end. Though you might not really celebrate by eating high-
2 cholesterol foods, there is no cause for concern.
(a) It uses only the two most extreme values.

3 (b) It uses only two values, but they are not the most extreme, so it is resistant.
(c) It uses all the numbers in the data set.
(d) Any two of: It is in the same units as the original data, it can be used in comparing z-scores from different data sets, you can predict what
percentage of the data set will be within a certain number of SD from the mean.
(a) s is standard deviation of a sample; σ is standard deviation of a population.

4 (b) µ is mean of a population; x̅ is mean of a sample.
(c) N is population size or number of members of the population; n is sample size or number of members of the sample.
You were 1.87 standard deviations above average. This is excellent performance. 1.87 is almost 2, and in a normal distribution, z = +2 would be
5 be er than 95+2.5 = 97.5% of the students. 1.87 is not quite up there, but close. (In Chapter 7, you’ll learn how to compute that a z-score of 1.87 is
be er than 96.9% of the population.)
Since the weights are normally distributed, 99.7% (“almost all”) of them will be within three SD above and below the mean. 3σ above and below
6 is a total range of 6σ. The actual range of “almost all” the apples was 8.50−4.50 = 4.00 ounces. 6σ = 4.00; therefore σ = 0.67 ounces .
Alternative solution: In a normal distribution, the mean is half way between the given extremes: µ = (4.50+8.50)/2 = 6.50. Then the distance from the
mean to 8.50 must be three SD: 8.50−6.50 = 2.00 = 3σ; σ = 0.67 ounces.
(a) This is a grouped distribution, so you need the class midpoints, as shown at right. Enter the
7 midpoints in L1 and the frequencies in L2.
Ages Midpoint (L1) Frequency (L2)
Caution! The midpoints are not midway between lower and upper bounds, such as (20+29)/2 = 20 – 29 25 34
24.5. They are midway between successive lower bounds, such as (20+30)/2 = 25.
1-VarStats L1,L2 (Check n first!) 30 – 39 35 58
x̅ = 63.85656971 → x̅ = 63.86 40 – 49 45 76
s = 15.43533244 → s = 15.44
n = 997 50 – 59 55 187
Common mistake: People tend to run 1-VarStats L1, leaving off the L2, which just gives statistics of the 60 – 69 65 254
seven numbers 25, 35, …, 85. Always check n first. If you check n and see that n = 7, you realize that 70 – 79 75 241
can’t possibly be right since the frequencies obviously add up to more than 7. You fix your mistake and
all is well. 80 – 89 85 147
(b) You need the original data to make a boxplot, and here you have only the grouped data. A boxplot of a grouped distribution doesn’t show the
shape of the data set accurately, because only class midpoints are taken into account. The class midpoints are good enough for approximating the
mean and SD of the data, but not the five-number summary that is pictured in the boxplot.
You need the weighted average, so put the quality points in L1 and the credits in L2.
8 (No, you can’t do it the other way around. The quality points are the numeric forms of Course Credits (L2) Grade
Quality
Points (L1)
your grades, and you have to give them weights according to the number of credits in each
course.) Statistics 3 A 4.0
1-VarStats L1,L2
n = 14 (This is the number of credits a empted. If you get 5, you forgot to include Calculus 4 B+ 3.3
L2 in the command.) Microsoft Word 1 C− 1.7
x̅ = 2.93
Microbiology 3 B− 2.7
English Comp 3 C 2.0
You don’t have the individual quiz scores, but remember what the average means: it’s the total divided by the number of data points. If your
9 quiz average is 86%, then on 10 quizzes you must have a total of 86×10 = 860 percentage points. If you need an 87% average on 11 quizzes, you
need 11×87 = 957 percentage points. 957−860 = 97; you can still skip the final exam if you get a 97 on the last quiz.
10 (a) Commute Distance, km

0- 9
10-19
4
12
(b) The class width is 10 (not 9). The class midpoints are 5, 15, 25, 35, 45 (not 4.5, 14.5, etc.).
20-29 7
(c) Class midpoints in one list such as L2 and frequencies in another list such as L3. This is a sample, so
30-39 1 symbols are x̅, s, n, not µ, σ, N.
40-49 1 1-VarStats L2,L3
Total 25 x̅ = 18.2 km
s = 9.5 km
n = 25
(d) Data in a list such as L1. 1-VarStats L1 gives x̅ = 17.6 km , Median = 17 , s = 9.0 km , n = 25
(e)
(f) Mean , because the data are nearly symmetric. Or, median, because there is an outlier.
Comment: The stemplot made the data look skewed, but that was just an artifact of the choice of classes. The boxplot shows that the data are nearly
symmetric, except for that outlier. This is why the mean and median are close together. This is a good illustration that sometimes there is no uniquely
correct answer. It’s why your justification or explanation is an important part of your answer.
(g) The five-number summary, from MATH200A part 2 [TRACE], is 1, 12, 17 22.5, 45 . There is one outlier, 45 .
(The five-number summary includes the actual min and max, whether they are outliers or not.)
Since 500 equals the mean, its z-score is 0. For 700, compute the z-score as z = (700−500)/100 = 2.
11 So you need the probability of data falling between the mean and two SD above the mean.
Make a sketch and shade this area.
Draw an auxiliary line at z = −2. You know that the area between z = −2 and z = +2 is 95%, so the
area between z = 0 and z = 2 is half that, 47.5% or 0.475.
To compare apples and oranges, compute their z-scores:

12 zJ = (2070−1500)/300 = 570/300 = 1.90
zM = (129−100)/15 = 29/15 = about 1.93
Because she has the higher z-score, according to the tests Maria is more intelligent.
Remark: The difference is very slight. Quite possibly, on another day Jacinto might do slightly be er and Maria slightly worse, reversing their
ranking.
Start with the class marks or midpoints, as shown at right. (Class midpoints are
13 halfway between successive lower bounds: (470+480)/2 = 475. You can’t calculate them Test Scores
Frequencies, f
(L2)
Class Midpoints, x
(L1)
between lower and upper bounds, (470+479.9)/2=474.95.)
Put class midpoints in a list, such as L1, and frequencies go in another list, such as L2. 470.0–479.9 15 475.0
(Either label the columns with the lists you use, as I did here, or state them explicitly: “class
marks in L1, frequencies in L2”.) 480.0–489.9 22 485.0
1-VarStats L1,L2 (Always write down the command that you used.) 490.0–499.9 29 495.0
(a) n = 154
(b) x̅ = 499.81 (before rounding, 499.8051948) 500.0–509.9 50 505.0
(c) s = 12.74 (before rounding, 12.74284519)
510.0–519.9 38 515.0
Be careful with symbols. Use the correct one for symbol or population, whichever you
have.
Common mistake: The SD is 12.74 (Sx), not 12.70 (σ), because this is a sample and not the population.
The mean is much greater than the median. This usually means that the distribution is skewed right , like incomes at a corporation.
14

please donate at
64% of the variation in salary is associated with variation in age.

1 Common mistake: Don’t use any form of the word “correlation” in your answer. Your friend wouldn’t understand it, but it’s wrong
anyway. Correlation is the interpretation of r, not R². Yes, r is related to R², but R² as such is not about correlation.
Common mistake: R² tells you how much of the variation in y is associated with variation in x, not the other way around. It’s not accurate to say
64% of variation in age is associated with variation in salary.
Common mistake: Don’t say “explained by” to non-technical people. The regression shows an association, but it does not show that growing
older causes salary increases.
(a) We know that power boats kill manatees, so the boat registrations must be the explanatory variable (x) and the manatee power-boat kills
2 must be the response variable (y). (Although this is an observational study, the cause of death is recorded, so we do know that the boats cause
these manatee deaths.)
(b) Yes
(c) The results of LinReg(ax+b) L1,L2,Y1 are shown at right. The correlation coefficient is r = 0.91
(d) ŷ = 0.1127x − 35.1786

Note: ŷ, not y. Note: −35.1786, not +−35.1786.
(e) The slope is 0.1127. An increase of 1000 power-boat registrations is associated with an increase of about 0.11 manatee
deaths, on average.
It’s every 1000 boats, not every boat, because the original table is in thousands. Always be specific: “increase”, not just “change”.
Remark: Although this is mathematically accurate, people may not respond well to 0.11 as a number of deaths, which obviously is a discrete variable.
You might multiply by 100 and say that 100,000 extra registrations are associated with 11 more manatee deaths on average; or multiply by 10 and
round a bit to say that 10,000 extra registrations are associated with about one more manatee death on average.
(f) The y intercept is −35.1786. Mathematically, if there were no power boats there would be about minus 35 manatees killed by power boats. But this
is not applicable because x=0 (no boats) is far outside the range of x in the data set.
(g) R² = 0.83. About 83% of variation in manatee deaths from power boats is associated with the variation in registrations of power boats.
It’s R², not r². And don’t use any form of the word “correlate” in your answer.
100% of manatee power-boat deaths come from power boats, so why isn’t the association 100%? The other 17% is lurking variables plus natural
variability. For instance, maybe the weather was different in some years, so owners were more or less likely to use their boats. Maybe a campaign of
awareness in some years caused some owners to lower their speeds in known manatee areas.
(h) ŷ = 27.8
(i) y−ŷ = 34−27.8 = 6.2
(j) Remember that x is in thousands, so a million boats is x = 1000. But x=1000 is far outside the data range, so the regression can’t be used to make a
prediction.
The decision point for n=10 is 0.632, and |r| = 0.57. |r| < d.p., and therefore you can’t reach a conclusion. From the sample data, it’s impossible
3 to say whether there is any association between TV watching and GPA for TC3 students in general.
Note: Always state the decision point and show the comparison to r.
(a) Yes
4
The point (0,6) is hard to see behind the y axis, but it’s there.
(b) The results of LinReg(ax+b) L3,L4,Y2 are shown at right. ŷ = −3.5175x+6.4561
(c) The slope is −3.5175. Increasing the dial se ing by one unit decreases temperature by about 3.5°.
Again, state whether y increases or decreases with increasing x.
(d) The y intercept is 6.4561. A dial se ing of 0 corresponds to about 6.5°.
(e) r = −0.99
(f) R² = 0.98. About 98% of variation in temperature is associated with variation in dial se ing.
This seems almost too good to be true, as though the data were just made up. ☺ But it’s hard to think of many lurking variables. Maybe it
happened that some measurements were taken just after the compressor shut off, and others were taken just before the compressor was ready to
switch on again in response to a temperature rise.
(g) ŷ = 2.9°
For n = 12, the decision point is 0.576. |r| = 0.85 is greater than that, so there is an association. Increased study time is associated with increased
5 exam score for statistics students in general.
No. There’s a lurking variable here: age. Older pupils tend to have larger feet and also tend to have increased reading ability.
6
r, the linear correlation coefficient, would be roughly zero . Taking the plot as a whole, as x increases, y is about equally likely to increase or
7 decrease. A straight line would be a terrible model for the data.
Clearly there is a strong correlation, but it is not a linear correlation. Probably a good model for this data set would be a quadratic regression,
ŷ = ax²+bx+c. Though we study only linear regressions, your calculator can perform quadratic and many other types.
The coefficient of determination, R², answers this question. For linear correlations, R² is indeed the square of the correlation coefficient r.
8 r = 0.30 ⇒ R² = 0.09. Therefore 9% of the variation in IQ is associated with variation in income.
Remark: Don’t say “caused by” variation in family income. Correlation is not causation. You can think of some reasons why it might be plausible that
wealthier families are more likely to produce smarter children, or at least children who do be er on standardized tests, but you can’t be sure without
a controlled experiment.
Remark: Though it’s an interesting fact, the correlation in twins’ IQ scores is not needed for this problem. In real life, an important part of solving
problems and making decisions is focusing on just the relevant information and not ge ing distracted.
Problem Set 1 Because this textbook helps you,

please donate at
(a) There are three coins, and each has two possible outcomes, so the sample space will have 2³ = 8 entries .
1
HHH HTH THH TTH
(b) S = { HHT HTT THT TTT }
(c) Three events out of eight equally likely events: P(2H) = 3/8
Common mistake: Sometimes students write the sample space correctly but miss one of the combinations of 2 heads. I wish I could offer some
“magic bullet” for counting correctly, but the only advice I have is just to be really careful.
(a) In a probability model, the probabilities must add to 1 (= 100%). The given probabilities add to 62.6%. What is the
2 missing 37.4%? They’ve accounted for cell and landline, cell only, and nothing; the remaining possibility is landline Service type Prob.
only. The model is shown at right. Landline and cell 58.2%
(b) P(Landline) = P(Landline only) + P(Landline and cell) Service type Prob.
P(Landline) = 37.4% + 58.2% = 95.6%
Landline only 37.4%
Remark: “Landline” and “cell” are not disjoint events, because a given household could have both. But “landline only” Cell only 2.8%
and “landline and cell” are disjoint, because a given house can’t both have a cell phone with landline and have no cell No phone 1.6%
phone with landline.
Total 100.0%
No, because the events are not disjoint. The figures are for being struck or a acked, not killed. You’d have to be pre y unlucky to be struck by
3 lightning and a acked by a shark in the same year, but it could happen. If the question were about being killed by lightning or by a shark, then
the events would be disjoint and you could add the probabilities.
(a) P(not A) = 1−P(A) = 1−0.7 → P(not A) = 0.3

4
(b) That A and B are complementary means that one or the other must happen, but not both. Therefore P(B) = P(not A) → P(B) = 0.3
(c) Since the events are complementary, they can’t both happen: P(A and B) = 0
Common mistake: Many students get (c) wrong, giving an answer of 1. If events are complementary, they can’t both happen at the same time. That
means P(A and B) must be 0, the probability of something impossible.
Maybe those students were thinking of P(A or B). If A and B are complementary, then one or the other must happen, so P(A or B) = P(A) + P(B) =
1. But part (c) was about probability and, not probability or.
Yes, because the events are disjoint or mutually exclusive: a person might have both cancer and heart disease, but the death certificate will list
5 one cause of death. (1/5 + 1/7 ≈ 34%.)
P(divorced | man) is the probability that a randomly selected man is divorced, or the proportion of men who are divorced . P(man | divorced)
6 is the probability that a randomly selected divorced person is a man, or the proportion of divorced persons that are men .
If the probability of a future event is zero, then that event is impossible. If the probability of a past event is zero, that just means that it didn’t
7 happen in the cases that were studied, not that it couldn’t have happened.
This is the difference between theoretical and empirical probability. A truly impossible event has a theoretical probability of zero. But the 0 out of
412 figure is an empirical probability (based on past experience). Empirical probabilities are just estimates of the “real” theoretical probability. From
the empirical 0/412, you can tell that the theoretical probability is very low, but not necessarily zero. In plain language, an unresolved complaint is
unlikely, but just because it hasn’t happened yet doesn’t mean it can’t happen.
13/52 or 1/4
8
Common mistake: Students often try some sort of complicated calculation here. You would have to do that if conditions were stated on all five of
those cards, but they weren’t. Think about it: any card has a 1/4 chance of being a spade.
S = { HH, HT, TH, TT }

9 (a) Three outcomes (HH, HT, TH) have at least one head. One of the three has both coins heads. Therefore the probability is 1/3 .
(b) Two outcomes (HH, HT) have heads on the first coin. One of the two has both coins heads. Therefore the probability is 1/2 .
(a) 0.0171 × 0.0171 = 0.0003

10
(b) The events are not independent. When a married couple are at home together or out together, any a ack that involves one of them will
involve the other also.
(a) P(divorced) = 22.8/219.7 ≈ 0.1038

11
(b) About 10.38% of American adults in 2006 were divorced. If you randomly selected an American adult in 2006, there was a 0.1038
probability that he or she was divorced.
(c) Empirical or experimental
(d) P(divorcedC) = 1−P(divorced) = 1−22.8/219.7 ≈ 0.8962

About 89.62% of American adults in 2006 were not divorced (or, had a marital status other than divorced).
(e) P(man and married) = 63.6/219.7 ≈ 0.2895 (You can’t use a formula on this one.)
(f) Add up P(man) and P(not man but married):

P(man or married) = 106.2/219.7 + 64.1/219.7 ≈ 0.7751
Alternative solution: By formula:

P(man or married) = P(man) + P(married) − P(man and married)
P(man or married) = 106.2/219.7 + 127.7/219.7 − 63.6/219.7 = 0.7751
Remember, math “or” means one or the other or both.
(g) What proportion of males were never married ? 30.3/106.2 = 28.53% .
(h) P(man | married) uses the sub-subgroup of men within the subgroup of married persons.
P(man | married) = 63.6/127.7 = 0.4980
49.80% of married persons were men.
Remark: You might be surprised that it’s under 50%. Isn’t polygamy illegal in the US? Yes, it is. But the table considers only resident adults.
Women tend to marry slightly earlier than men, so fewer grooms than brides are under 18. Also, soldiers deployed abroad are more likely to be male.
(i) P(married | man) used the sub-subgroup of married persons within the subgroup of men.
P(married | men) = 63.6/106.2 = 0.5989
59.89% of men were married.
P(five cards, all diamonds) = (13/52) × (12/51) × (11/50) × (10/49) × (9/48) ≈ 0.0005
12 (I was surprised that the probability is that high, about once every 2000 hands. And the probability of being dealt a five-card flush of any suit is four times
that, about once in every 500 hands.)
(a) 3 of 20 M&Ms are yellow, so 17 are not yellow. You want the probability of three non-yellows in a row:
13 (17/20)×(16/19)×(15/18) ≈ 0.5965
(b) The probability is zero , since there are only two reds to start with.
You’re being asked about all three possibilities: two fail, one fails, none fail. Therefore the three probabilities must add up to 1, and you need
14 to compute only two of them. It’s also important to note that the companies are independent: whether one fails has nothing to do with
whether the other fails. (Without knowing that the companies are independent, you could not compute the probability that both fail.)
(a) Since the companies are independent, you can use the simple multiplication rule:
P(A bankrupt and W bankrupt) = P(A bankrupt) × P(W bankrupt)
P(A bankrupt and W bankrupt) = .9 × .8 = 0.72
At this point you could compute (b), but it’s li le messy because you need the probability that A fails and W is okay, plus the probability that A is
okay and W fails. (c) looks easier, so do that first.
(c) “Neither bankrupt” means both are okay. Again, the events are independent so you can use the simple multiplication rule.
P(neither bankrupt) = P(A okay and W okay)
P(A okay) = 1−.9 = 0.1; P(W okay) = 1−.8 = 0.2
P(neither bankrupt) = .1 × .2 = 0.02
(b) is now a piece of cake.

P(only one bankrupt) = 1 − P(both bankrupt) − P(none bankrupt)
P(only one bankrupt) = 1 − .72 − .02 = 0.26
Remark: If you have time, it’s always good to check your work and work out (b) the long way. You have only independent events (whether A is okay
or fails, whether W is okay or fails) and disjoint events (A fails and W okay, A okay and W fails). The “okay” probabilities were computed in part (c).
P(only one bankrupt) = (A bankrupt and W okay) or (A okay and W bankrupt)
P(only one bankrupt) = (.9 × .2) + (.1 × .8) = 0.26
Common mistake: When working this out the long way, students often solve only half the problem. But when you have probability of exactly one
out of two, you have to consider both A-and-not-W and W-and-not-A.
You can’t use the “or” formula here, even if you studied it. That computes the probability of one or the other or both, but you need the
probability of one or the other but not both.
Remark: If you computed all three probabilities the long way, pause a moment to check your work by adding them to make sure you get 1.
Whenever possible, check your work with a second type of computation.
(a) (You can assume independence because it’s a small sample from a large population.) P(red1 and red2 and red3) = 0.13×0.13×0.13 = 0.0022
15
(b) P(red) = 0.13; P(redC) = 1−0.13 = 0.87.
P(red1C and red2C and red3C) = 0.87×0.87×0.87 or 0.87³ = 0.6585
Common mistake: Students sometimes compute 1−.13³. But .13³ is the probability that all three are red, so 1−.13³ is the probability that fewer than three
(0, 1, or 2) are red. You need the probability that zero are red, not the probability that 0, 1, or 2 are red. Think carefully about where your “not”
condition must be applied!
(c) The complement is your friend with “at least” problems. The complement of “at least one is green” is “none of them is green”, which is the same
as “every one is something other than green.”
P(green) = 0.16, P(non-green) = 1−0.16 = 0.84.
P(≥1 green of 3) = 1 − P(0 green of 3) = 1 − P(3 non-green of 3) = 1−0.84³ ≈ 0.4073
(d) (Sequences are the most practical way to solve this one.)
(A) G1 and G2C and G3C; (B) G1C and G2 and G3C; (C) G1C and G2C and G3
.16×(1−.16)×(1−.16) + (1−.16)×.16×(1−.16) + (1−.16)×(1−.16)×.16 ≈ 0.3387
In “at least” and “no more than” probability problems, the complement is often your friend. The complement of “at least one had not
16 a ended” is “all had a ended”. If the fans are randomly selected, their a endance is independent and you can use the simple multiplication
rule.
P(all 5 a ended) = 0.45^5 = 0.0185
P(at least 1 had not a ended) = 1 − 0.0185 = 0.9815
Sequences are the way to go here:

17 (cherry1 and orange2) or (orange1 and cherry2)
Common mistake: There are two ways to get one of each: cherry followed by orange and orange followed by cherry. You have to consider both
probabilities.
There are 11+9 = 20 sourballs in all, and Grace is choosing the sourballs without replacement (one would hope!), so the probabilities are:
(11/20)×(9/19) + (9/20)×(11/19) = 99/190 or about 0.5211
The complement is your friend, and the complement of “win at least once in 5 years” is “win 0 times in 5 years” or “lose 5 times in 5 years”.
18 P(win ≥1) = 1−P(win 0) = 1−P(lose 5).
P(lose) = 1−P(win) = 1−(1/500) = 499/500
P(lose 5) = [P(lose)]5 = (499/500)^5 = 0.9900
P(win ≥1) = 1−P(lose 5) = 1−0.9900 = 0.0100 or 1.00%
Common mistake: If you compute 1−(499/500)5 in one step and get 0.00996008, be careful with your rounding! 0.00996... rounds to 0.0100 or 1%, not
0.0010 or 0.1%.
Common mistake: 1/500 + 1/500 + ... is wrong. You can add probabilities only when events are disjoint, and wins in the various years are not disjoint
events. It is possible (however unlikely) to win more than once; otherwise it would make no sense for the problem to talk about winning “at least
once”.
Common mistake: You can’t multiply 5 by anything. Take an analogy: the probability of heads in one coin flip is 50%. Does that mean that the
probability of heads in four flips is 4×50% = 200%? Obviously not! Any process that leads to a probability >1 must be incorrect.
Common mistake: 1−(1/500)5 is wrong. (1/500)5 is the probability of winning five years in a row, so 1−(1/500)5 is the probability of winning 0 to 4
times. What the problem asks is the probability of winning 1 to 5 times.
(a), (b), and (c) are all the possibilities there are, so the probabilities must total 1. You can compute two of them and then subtract from 1 to get
19 the third.
(a) P(not first and not second) = P(not first) × P(not second) = (1−.7)×(1−.6) = 0.12
(c) P(first and second) = P(first) × P(second) = .7×.6 = 0.42
(b) 1−.12−.42 = 0.46

Alternative: You could compute (b) directly too, using sequences:
P(exactly one copy recorded) =
P(first and not second) + P(second and not first) =
P(first)×(1−P(second)) + P(second)×(1−P(first)) =
.7×(1−.6) + .6×(1−.7) = 0.46
A very common mistake on problems like this is writing down only one of the sequences. When you have exactly one success (or exactly any definite
number), almost always there are multiple ways to get to that outcome.
You can’t use the “or” formula here, even if you studied it. That computes the probability of one or the other or both, but you need the
probability of one or the other but not both.
Problem Set 2
(a) P(ticket on route A) = P(taking route A) × P(speed trap on route A) = 0.2×0.4 = 0.08. In the same way, the probabilities of ge ing a ticket on
20 routes B, C, D are 0.1×0.3 = 0.03, 0.5×0.2 = 0.10, and 0.2×0.3 = 0.06. He can’t take more than one route to work on a given day, so those are
disjoint events. The probability that he gets a ticket on any one morning is therefore 0.08+0.03+0.10+0.06 = 0.27 .
(b) The probability of not ge ing a ticket on a given morning is 1−0.27 = 0.73. The probability of ge ing no tickets on five mornings in a row is
therefore 0.735 ≈ 0.2073 or about 21% .
Two events A and B are independent if P(A|B) = P(A).

P(man) = 106.2/219.7 ≈0.4834
21 P(man|divorced) = 9.7/22.8 ≈ 0.4254
Since P(man|divorced) ≠ P(man), the events are not independent.
Alternative solution: You could equally well show that P(divorced|man) ≠ P(divorced):
P(divorced|man) = 9.7/106.2 ≈ 0.0913
P(divorced) = 22.8/219.7 ≈ 0.1038
What’s the probability of ten of the same flip in a row? In other words, given either result, what’s the probability that the next nine will be the
22 same? That must be (1/2)9 = 1/512. You therefore expect this to happen about once in about every 500 flips, or about twice in every thousand.
P(open door) = P(unlocked) + P(locked)×P(right key)

23 P(open door) = 0.5 + 0.5×(2/5) = 0.7

please donate at
(a) 0, 1, 2, 3, 4, 5
1 (b)There are five trials, each die is either a two or not a two, and the dice are independent. This fits the binomial model .
(a) The probability model is shown at right. (I computed the probability of losing $5 as 1−[1/10000000+1/125+1/20].)
2 x ($) P(x)
(b) $ in L1, probabilities in L2. 1-VarStats L1,L2 yields µ = −2.70. The expected value of a ticket is −$2.70. This is a 9,999,995 1/10,000,000
bad deal for you . (It’s a very good deal for the lo ery company. They’ll make $2.70 per ticket, on average.)
Common mistakes: Students sometimes give hand-waving arguments such as the top prize being very unlikely, or 95 1/125
the lo ery company always ge ing to keep the ticket price, but these are not relevant. The only thing that determines
5 1/20
whether it’s a good or bad deal for the player is the expected value µ.
−5 .9419999
(a) This is a geometric model: repeated failures until a success, with p = 0.066.
3 µ = 1/p = 1/.066 ≈ 15.2
Over the course of her undead existence, taking each night’s hunt as a separate experience, the average of all nights has her first ge ing an O negative
drink from her fifteenth victim.
(b) geometcdf(.066,10) = .4947936946 ≈ 0.4948 . Velma has almost a 50% chance of ge ing O negative blood within her first ten victims.
(You could also do this as a binomial, n = 10, p = 0.066, x = 1 to 10.)
(c) This is a binomial model with n = 10, p = 0.066, and x = 2. Use MATH200A part 3 or binompdf(10,.066,2) = .1135207874 ≈ 0.1135 . Velma has just
over an 11% chance of ge ing exactly two O negative victims within her first ten.
This is a geometric distribution. You’re looking for someone who is opposed to universal background checks, so p = 1−.92 = 0.08.
4
(a) geometpdf(.08, 3) = .067712 → 0.0677
(b) geometcdf(.08, 3) = .221312 → 0.2213

(You could also do this as a binomial with n = 3, p = 0.08, x = 1 to 3.)
(a) This is a binomial distribution: each student passes or not, whether one student passes has nothing to do with whether anyone else passes,
5 and there are a fixed seven trials.
µ = np = 7*0.8 ⇒ µ = 5.6 people
σ = √[npq] = √[7*0.8*(1−0.8)] = 1.058300524 ⇒ σ = 1.1 people
(b) Binomial again, n = 7, p = 0.8, x = 4 to 6. Use binompdf-sum or MATH200A part 3 to find P(4 ≤ x ≤ 6) = 0.7569 .
(c) Geometric model: p = 0.8, x = 3. geometpdf(.8,3) = 0.0320
(d) geometcdf(.8,2) = 0.9600

Alternative solution: Binomial probability with n = 2, p = 0.8, x = 1 to 2 gives the same answer.
This is binomial data, p = .49. For a sample of 40, expected value is µ = np = 40×.49 = 19.6. 13 is less than 19.6, so asking whether 13 is surprising is
6 really asking whether 0 to 13 is surprising; see Surprise! [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c06_Surprise]
binomcdf(40,.49,13) or MATH200A Program part 3 with n=40, p=.49, x=0 to 13 gives .0259693307 → 0.0260 , less than 5%, so you would be
surprised though maybe not flabbergasted. ☺
(a) Probability of one equals proportion of all, and therefore a randomly selected 22-year-old male has a 0.1304% chance of dying in the next year.
7 That’s the only “prize”, so multiply it by its probability to find fair price: 100000×0.001304 = $130.40
(b) The company’s gross profit is $180.00−130.40 = $49.60, about 28%. But it could very well cost the company that much to sell the policy, pay the
agent’s commission, and enter the policy in the computer. Also, all policies must bear part of the company’s general overhead costs. The price is not
necessarily unfair in the plain English sense.
(a) x’s in L1, P’s in L2. 1-VarStats L1,L2 yields µ = 2 (exactly) and σ = 1.095353824 or σ ≈ 1.1 . Interpretation: In the long run, on average you
8 expect to get two heads per group of five flips. You expect most groups of five flips will yield between µ−σ = 1 head and µ+σ = 3 heads.
(b) (I wouldn’t use this part as a regular quiz question.) The long-term average is 2 heads out of 5 flips, which is p = 2/5 = 40%. Obviously coin flips
are independent, so the probability of heads must be the same every time. Therefore you have a binomial model with n = 5 and p = 0.4 .
(a) Binomial probability with n = 5, p = 0.7, x = 3 to 5. MATH200A part 3 5, .7, 3, 5 yields .83692 or P(x ≥ 3) = 0.8369 . Or, binompdf(5,.7)→L6 and
9 then sum(L6,4,6) to get the same answer. Or, use the complement: 1−binomcdf(5,.7,2).
(b) You need the mean of the binomial distribution:

µ = np = 10×0.7 = 7
(c) 5 is less than the expected number, so you compute P(x≤5):

MATH200A part 3 10, .7, 0, 5 yields 0.1503, or
binomcdf(10,.7.5) = 0.1503, not surprising
Common mistake: Don’t just compute P(x=5), which is 0.1029. When you want to know whether a result is unusual or surprising, you have to find
the probability of that result or one even further from the expected value.
(a) Geometric model, p = 0.34. µ = 1/.34 ≈ 2.94. About three

10 (b) binompdf(5,.34,0) = .1252332576, about a 12.5% chance
Your words will vary, but you should have the idea that the binomial model is a fixed number of trials with varying number of successes,
11 whereas the geometric model is a varying number of trials that ends with the first success.
Your words will vary, but you should have the idea that a pdf is the probability of a specific outcome, and the cdf is the cumulative
12 probability of all outcomes 0 through a specified number. I’m not so concerned that you know what pdf and cdf stand for, as long as you
understand what they mean and when to use each.

please donate at
On any given trip, there’s a 9% chance that Chantal’s commute will be less than 17 minutes.
1 9% of Chantal’s commutes are shorter than 17 minutes.
P(x ≥ 76.5) = normalcdf(76.5, 10^99, 69.3, 2.92) = .0068362782.
2 Here are the two interpretations, from Interpreting Probability Statements in Chapter 5:
The probability that a randomly selected man is 76.5″ or taller is 0.0068 or 0.68%.
Only 0.68% of men are 76.5″ tall or taller.
“Have boundaries, find probability.”

3 P(64 ≤ x ≤ 67) = normalcdf(64, 67, 64.1, 2.75) = 0.3686871988 → 0.3687
36.87% of women are 64″ to 67″ tall.
5% probability in the two tails means 2.5% or 0.025 in each tail.

4 x1 = invNorm(.025, 69.3, 2.92) = 63.57690516
x2 = invNorm(1−.025, 69.3, 2.92) = 75.02309484
Heights under 63.6″ or over 75.0″ would be considered unusual.
The area to left is given as 15% or 0.15, and you need the boundary.
5 P15 = invNorm(.15, 69.3, 2.92) = 66.27361453
You must be at least 66″ or 5′6″ tall. Also acceptable: at least 66¼ inches, or at least 66.3 inches.
(a) By the definition of percentile, the number of the desired percentile is also the area to left.
6 P25 = invNorm(.25, 64.1, 2.75) = 62.24515319 → P25 = 62.2″
P75 = invNorm(.75, 64.1, 2.75) = 65.95484681 → P75 = 66.0″
(b) Q3 is P75 and Q1 is P25, so the IQR is P75−P25 = 65.95484681−62.24515319 = 3.70969362 →

IQR = 3.7″.
(c) 1.35σ = 1.35×2.75 = 3.7125 → 3.7″, matching the IQR as expected. (The match isn’t perfect, because
1.35 is a rounded number.)
Use MATH200A Program part 4. The screens are shown at right. The points fall reasonably close to a line.
7 r = 0.9595 and crit = 0.9383. r > crit, and therefore you can say that the normal model is a good fit to the data.
The percentile is the percent of the population that scored ≤735.

8 P(x ≤ 735) = normalcdf(−10^99, 735, 500, 100) = 0.9906.
A score of 735 is at the 99th percentile .
2% or 0.02 is area to right, but invNorm needs area to left, so you subtract from 1.
9 x1 = invNorm(1−.02, 1500, 300) = 2116.124673
You must score at least 2117. (If you round to 2116, you get a number that is a bit less than the computed
minimum. While rounding usually makes sense, there are situations where you have to round up, or round
down, instead of following the usual rule.)
z0.01 = invNorm(1−0.01, 0, 1) = 2.326347877 → z0.01 = 2.33
10
P(x < 60) = normalcdf(−10^99, 60, 69.3, 2.92) = 7.240062385E−4

11 P(x < 60) = 7.24×10-4 or (be er) 0.0007
Common mistake: The probability is not 7.24! That’s not just wrong, it’s very wrong — probabilities are
never greater than 1. “E−4” on your calculator comes at the end of the number, but it’s critical info. It means
“times 10 to the minus 4th power”, so the probability is 7×10−4 or 0.0007.
The probability that a randomly selected man is under 60″ tall is 0.0007 or 0.07%.
0.07% of men are under 60″ tall.
The plot is pre y clearly not a straight line — there’s a sharp bend around the second and third data points.
12 The numbers confirm this: r = .8363, crit = .9121, r < crit, and therefore the normal model is not a good fit for
this data set.
The middle 90% leaves 10% in the two tails, or 5% in each tail.
13
xm1 = invNorm(.05, 69.3, 2.92) = 64.49702741

xm2 = invNorm(1−.05, 69.3, 2.92) = 74.10297259
xf1 = invNorm(.05, 64.1, 2.75) = 59.57665253
xf2 = invNorm(1−.05, 64.1, 2.75) = 68.62334747
Men must be 64.5 to 74.1 inches tall; women must be 59.6 to 68.6 inches tall.

please donate at
This is numeric data. You have a random sample, and it’s less than 10% of the households in a country. Despite the skew, with sample size so far
1 above 30 you can be sure that the shape of the sampling distribution is approximately normal. The mean of the sampling distribution is µx̅ =
µ = $48,000 The SD of the sampling distribution of the mean, a/k/a standard error of the mean, is σx̅ = σ/√n = $2000/√64 → σx̅ = $250
(a) First, describe the distribution and sketch the situation. For the population, you’re given µ = 800, σ = 50, n = 100.
2 Center: The mean of the sampling distribution is the same as the mean of the population, 800 hours.
Spread: The standard error of the mean is σx̅ = σ/√n = 50/√100 = 5 hours.
Shape: You have a random sample, 10n = 10×100 = 1000 is certainly less than the total number of light bulbs, and your sample size is comfortably
larger than 30. Therefore you can use the normal model for the sampling distribution.
Sample means are ND with mean 800 hours and SD 5 hours. The sketch is at right.
Common mistake: The correct standard deviation is 5 hours, not 50. You’re not sketching the
population of light bulbs. Rather, you’re now interested in the distribution of average lifetimes in
samples of 100 bulbs. (The axis is the x̅ axis, not the x axis.)
780 hours, the sample mean that the problem asks about, is 20 hours below the population
mean of 800. 20/5 = 4 standard errors, so you should have marked 780 hours at four standard
deviations below the mean.
A sample mean of 780 is less than the population mean of 800 hours. Therefore you compute the
probability of a sample mean of 780 hours or less. It will be surprising (unusual, unexpected) if
the probability is under 5%.
P(x̅ ≤ 780) = normalcdf(−10^99, 780, 800, 50/√(100)) = 3.1686E-5 → P(x̅ ≤ 780) = 0.00003
(You can also give the probability as <0.0001.) Yes, this is surprising.
Common mistake: Don’t give the probability as 3.1686. Probabilities are never greater than 1.
(b) If the manufacturer’s claim is true, there are only three chances in a hundred thousand of ge ing a sample mean this low. It’s very unlikely that
the manufacturer’s claim is true.
(a) “Describe the distribution” means shape, center and spread. You can always get center and
3 spread, but if the test for normal approximation fails then you can’t say anything about the shape.
µp̂ = p = 0.72
σp̂ or SEP = √pq/n = √.72×(1−.72)/500 = 0.0200798406
Expected “yes” per sample: np = 500×.72 = 360; expected “no” = 500−360 = 140; both are well above 10.
You have a random sample, and 10×500 = 5000 is far less than the American population. Therefore the
normal approximation is valid.
Answer: normally distributed with mean = 0.72, standard deviation (standard error) = 0.020
Common mistake: Don’t write n≥30 when testing the normal approximation. The n≥30 test applies to
numeric data, but in this problem you have binomial data.
(b) 350/500 = 0.70 exactly, and 370/500 = 0.74 exactly. In a sample of 500, finding 350 to 374 successes is
the same as finding 70% to 74% successes.
If you stored the computed SEP in part (a), then your screen will look like the one on the left. Otherwise,
it will look like the one on the right:
or
Answer: P(70% ≤ p̂ ≤ 74%) = 0.6808 .
Remark: Always check for reasonableness. 70% and 74% are one standard error below and above the mean, so you know from the Empirical Rule
that about 68% of the data should be within that region.
Remark: The problem wanted you to use the normal approximation, but it’s always good to check answers by a different method if possible.
70%×500 = 350; 74%×500 = 370. MATH200A part 3 with n=500, p=.72, from 350 to 370, gives a probability of 0.7044, pre y good agreement.
The sampling distribution of x̅ is ND because the sample size of 1000 is greater than 30 and the
4 random sample is smaller than 10% of the population (10% of 100,000 households is 10,000
households). The SEM is σx̅ = 19000/√1000 ≈ $601.
P(x̅ ≤ $31,000) = normalcdf(−10^99, 31000, 32400, 19000/√(1000)) = 0.0099 , almost exactly 1%. That would
be pre y unlikely if the population mean was still $32,400, so the city manager is most likely correct.
Remark: This problem was adapted from Freedman, Pisani, Purves (2007, 415) [see “Sources Used” at end
of book].
(a) The model is at right. You could list green and black separately, but since they have the same outcome
5 there’s no need to do that. It’s important to have the probabilities as exact fractions, not approximate
x P(x)
decimals. Red +10 18/38
Black or Green −10 20/38
Total n/a 38/38 = 1
(b) x’s in L1, P’s in L2. 1-VarStats L1,L2 gives µ = −$0.53, σ = $9.99 . Interpretation: In the long run, a player who bets $10 on red will lose an
average of 53¢ per bet.
Remark: Notice that the SD is about 20 times the mean. This is why gambling is so exciting for the player: there’s a lot
of variability from one bet to the next.
(c) With n = 10,000, the sampling distribution of x̅ is normally distributed . (10n = 10×10,000 = 100,000, less than the total number of bets while the
casino is in business. The bets placed in a given day are not random, but they are representative of all possible bets and therefore effectively random.)
The mean of the sampling distribution is the mean of the population: µx̅ = +$0.53 . (Whatever players lose, the casino wins, so the mean is the
opposite of a player’s mean.) The standard error of the mean is σ/√n = 9.986139979/√10000; σx̅ ≈ $0.10 .
Remark: This is why gambling is predictable for the operators: the SD is small compared to the mean.
(d) 10,000×$.5263157895 = $5,263.16
(e) To lose money, the casino has to make less than $0.00. Zero is more than five standard errors
below the mean (has a z-score below −5), so you know right off that it would be unusual for the
casino to lose money. normalcdf confirms that:
P(lose on 10,000 bets) = 6.8×10-8. The casino has essentially no risk (7 chances in 100 million) of
losing money on 10,000 bets.
(f) Remember the elevator example. A total of $2000 on 10,000 bets is an average of 2000/10,000 = $0.20 per
bet. Use normalcdf to compute the probability of doing that well or be er:
P(make ≥$2000) = 0.9995. Not only is the casino virtually certain not to lose money, it’s almost certain to
make a handsome profit, as long as people come in to place bets.
Given: µ = 5.00, σ = 0.05, n = 15. Needed: P(∑x>75.6). A sample weighing 75.6 lb total will have a
6 sample mean of 75.6/15 = 5.04 lb, so this is really just another problem in finding the probability
of turning up a sample mean in a given range.
µx̅ = µ = 5.00 lb
The SEM is σx̅ = 0.05/√15 ≈ 0.013 lb.
The sample means are normally distributed, even for this small sample, because the original
population is normally distributed.
P(∑x > 75.6) = P(x̅ > 5.04) = normalcdf(75.6/15, 10^99, 5.00, 0.05/√(15)) = 9.7295E-4 ≈ 0.0010 , about one
chance in a thousand.
(a) This part is a standard Chapter 7 problem about individuals, not samples, so the axis is x rather than x̅.
7
Answer: P(x > 43.0) = 0.1634
(b) The sampling distribution of x̅ is ND, even for this small sample, because the population is ND. The standard error is σx̅ = 5.1/√14 ≈ 1.4.
P(x̅ > 43.0) = normalcdf(43, 10^99, 38, 5.1/√(14)) 1.2212E-4 → P(x̅>43.0) = 0.0001 or 0.01%
Remark: This sketch is not very well proportioned, because it makes the probability look much larger than it actually is.
12,778 KW shared among 1000 households is 12778/1000 = 12.778 KW per household on average.
8 “Fail to supply enough power” means that the households are using more power than that. You
need P(x̅ > 12.778) for n = 1000.
The standard error of the mean is σx̅ = 3.5/√1000, about 0.11. The sampling distribution of the mean is
normal because data are numeric and n =1000, greater than 30. (Treat the sample as random because it’s a
“typical neighborhood”. And a thousand households is less than 10% of all the households that there
are.)
P(x̅ > 12.778) = normalcdf(12.778, 10^99, 12.5, 3.5/√(1000) = 0.0060
p = 0.0171, n = 11,037, and you want to find P(p̂ ≤ 0.0094). First check that the sampling distribution
9 of p̂ is a ND:
The doctors were randomized between treatment and placebo groups.
10×11,037 = 110,370. There are more adult males than that.
np = 11037×.0171 = about 189; nq = 11037−189 = 10848. Both are well above 10.
Therefore the sampling distribution can be approximated by a normal distribution.
The standard error of the proportion or SEP is σp̂ = √pq/n = √.0171(1−.0171)/11037 ≈ 0.0012
If you use my shortcut [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c08_SEshortcut], your screen

will look like the one at the left; if not, it will look like the one at the right.
or
Either way, the probability is 2.2013×10-10 , or 0.000 000 000 2. There are only two chances in ten billion of ge ing a sample proportion of 0.94% or
less with sample size 11,037, if the true population proportion is 1.71%. That’s pre y darn unlikely, so based on this experiment you can rule out
coincidence and decide that aspirin does reduce the chance of a heart a ack among adult males.
Heights are ND, so the sampling distribution is also. By the Empirical Rule or 68–95–99.7 Rule, 95% of a ND falls within 2 SD of the mean. The
10 distribution that concerns you in this problem is the sampling distribution of x̅, not the original distribution of individual men’s heights.
Therefore, the SD that concerns you is the standard error of the mean, not the SD of men’s heights.
The standard error of the mean or SEM is σx̅ = σ/√n = 2.92/√16 = 0.73″.
µx̅ ± 2σx̅ = 69.3 ± 2×.73 = 67.84 to 70.76.
Sample means between those values would not be surprising, and therefore a sample mean would be surprising if it is under 67.84″ or over
70.76″ .
Alternative solution: That back-of-the-envelope calculation is good enough, but you could also get a more precise answer:
L = invNorm(0.025, 69.3, 2.92/√(16)) = 67.87
H = invNorm(1−0.025, 69.3, 2.92/√(16)) = 70.73
This is like the Swain v. Alabama example. You have to convert the sample counts into a proportion: p̂ = 737/1504 ≈ 49%. The problem is really
11 asking you for P(p̂ ≥ 49%) in a sample of 1504 with population proportion of 45%.
What does the sampling distribution look like? The center is µp̂ = p = 0.45. The standard error is σp̂ =
√0.45×(1−.45)/1504 ≈ 0.013. Check requirements to make sure that a normal model can be used for the sampling
distribution:
Random sample? Yes, given.
Sample less than 10% of population? 10×1504 = 15,040, compared to millions of American adults, OK.
Sample large enough? Yes, 0.45×1504 ≈ 677 successes and 1504−677 ≈ 827 failures expected, both above
10.
P(x ≥ 737) = P(p̂ ≥ 49%) = normalcdf(737/1504, 10^99, .45, √(.45*(1−.45)/1504)) ≈ 9E-4 or 0.0009 .
Can you draw a conclusion? Yes, you can. In a population with 45% unfavorable rating of the Tea Party, there are only 9 chances in 10,000 of
ge ing a sample as unfavorable as this one (or more unfavorable). That’s pre y unlikely, so you conclude that the true unfavorable rating in October
was most likely more than 45% of all Americans. (In Chapter 9, you’ll learn how to estimate that proportion from a sample.)

please donate at
You make probability statements about things that can change if you repeat the experiment. There’s a 1/6 chance of rolling doubles, because
1 you’ll get doubles about 1/6 of the times that you roll two dice. But the mean of the population is one definite number. It doesn’t change from
one experiment to the next. Your estimate changes, because it’s based on your sample and no sample is perfect. But the thing you’re trying to
estimate, mean or proportion, is what it is even though you don’t know it exactly.
(Statisticians would say, “the population mean or proportion is not a random variable.” By that, they mean just what I said in less technical
language.)
Answer: A confidence interval for numeric data is an estimate of the average, and tells you nothing about individuals. Correct his conclusion to
2 I’m 90% confident that the average food expense for all TC3 students is between $45.20 and $60.14 per week. .
Remark: Use all or a similar word to show that you’re estimating the mean for the population, not just the sample of 40 students. There’s no need
to estimate the mean of the sample, because you know the exact sample mean x̅ for your sample.
Remark: Be clear in your mind that you’re estimating the average spending per student at $45–60 a week. Some individual students will quite
likely spend outside that range, so your interpretation shouldn’t say anything about individual student spending.
Answer: It’s the use of the word average . When you collect data points that are all yes/no or success/failure, you have a sample proportion p̂,
3 equal to the number of successes divided by sample size, and you can estimate a population proportion. There is no “average” with non-numeric
data.
Your 90% confidence estimate is simply that 27% to 40% usually or always prepare their own food.
This is a confidence interval about a mean, Case 1 in Inferential Statistics: Basic Cases [URL:
4 h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top].
Requirements: random sample, OK. 10n = 10×40 = 400 is less than total number of ba eries made; OK. n = 40 >30, OK.
TInterval 1756, 142, 40, .95
(1710.6, 1801.4)
Neveready is 95% confident that the average Neveready A cell, operating a wireless mouse, lasts 1711 to 1801 minutes (28½ to 30 hours).
Common mistake: Don’t make any statement about 95% of the ba eries! Your CI is about your estimate of one number, the average life of all ba eries.
Your CI has a margin of error of ±15 minutes; the 95% range for all ba eries would be about 4 to 5 hours.
(a) p̂ = 5067/10000 = 0.5067

5 Don’t make the term “point estimate” harder than it is! The point estimate for the population mean (or proportion, standard deviation, etc.)
is just the sample mean (or proportion, standard deviation, etc.).
(b) The sample is his actual data, the 10,000 flips. Therefore the sample size is n = 10,000. The population is what he wants to know about, all
possible flips. The population size is infinite or “indefinitely large”.
This is sample size for a confidence interval about a proportion, Case 2 in Inferential Statistics: Basic Cases. Since you have no prior estimate, use
6 0.5 for p̂.
MATH200a/sample size/binomial, p̂ = .5, E = .035, C-Level = .95,

sample size is at least 784 The formula is .
1−α = .95 ⇒ α/2 = 0.025.

z0.025 = invNorm(1−.025, 0, 1)
Divide by .035, square the result, and multiply
by .5*(1−.5).
Answer: at least 784. Remember — you’re not
rounding, you’re going up to a whole number.
This is a confidence interval about a proportion, Case 2 in Inferential Statistics: Basic Cases.
7
Requirements:
Random sample, OK.
10n = 10×100 = 1000 < 68,917, OK.
40 successes, 100−40 = 60 failures, both > 10, OK.
Common mistake: Don’t say “n > 30” or “n ≥ 30”. That’s true, but it doesn’t help you with binomial data. For computing a confidence interval about a
proportion from binomial data, the “sample size large enough” condition is at least 10 successes and at least 10 failures, not sample size at least 30.
1-PropZInt 40, 100, .9 → (.31942, .48058), p̂ = .4

31.9% to 48.1% of all claims at that office have been open for more than a year (90% confidence).
This is a confidence interval about a mean, Case 1 in Inferential Statistics: Basic Cases.
8
Requirements check:
Random sample, OK.
10×40 = 400, less than the number of times she could commute (past, present, and future), OK.
Sample size 40 > 30, OK.
TInterval 17.7, 1.8, 40, .95 → (17.124, 18.276)
She’s 95% confident that the average of all her commutes is 17.1 to 18.3 minutes.
9
Requirements check:
Random sample, OK.
10×15 = 150 is less than total number of women in their 20s.
MATH200A/Normality: r=.9667, CRIT=.9383, r>CRIT, OK.
MATH200A/Box-whisker: no outliers, OK.
TInterval L6, 1, .95 → (62.918, 65.016), x̅=63.96666667, s=1.894226818, n=15
The average height of women aged 20–29 is 62.9 to 65.0 inches (95% confidence) .
Remark: Since adult women’s heights are known to be normally distributed, you could get away without checking for normality and outliers in this
sample. But it does no harm to check every time.
10
Requirements check:
Random sample, OK.
10×18 = 180. There are far more than 180 male students; OK.
MATH200A/box-whisker: no outliers, OK.
MATH200A/Normality check: r=.9787, CRIT=.9461, r>CRIT, OK.
TInterval L5, 1, .9 → (97.757, 98.343), x̅ = 98.05, s =.7155828558 → 0.72, n = 18.
(a) Fred is 90% confident that the average body temperature of healthy male students is 97.8 to 98.3 °F.
(b) He’s 90% confident that the average body temperature is not more than 98.3°, so 98.6° as normal (average) temperature is inconsistent with his
data.
(c) E = 98.343−98.05 = 0.3°, or E = 98.05−97.757 = 0.3°, or (98.343−97.757)/2 = 0.3°.
(d) MATH200A/Sample size/Num unknown σ: (d) Confidence level = 1−α = 0.95 ⇒ α = 0.05 ⇒ α/2 = 0.025.
s=.7155828558, E=.1, C-Level=.95, n≥202. He will need z0.025 = invNorm(1−.025)
at least 202 in his sample.
Multiply by s, divide by E, and square the result. This gives
197. But the t distribution is more spread out than the normal
(z) distribution, so you probably want to bump that number
up a bit, say to 200 or so.
This problem is about a confidence interval about a proportion, Case 2 in Inferential Statistics: Basic Cases.
11
(a) Requirements check:
Random sample, OK.
10×500 = 5000. A city of 6.4 million must have more than 5000 in that age range. OK.
219 successes, 500−219 = 281 failures. Both > 10, OK.
1-PropZInt, 219, 500, .9 → (.4015, .4745), p̂ = .438
You’re 90% confident that 40.2% to 47.5% of Metropolis adults aged 50–75 have had a colonoscopy in the past ten years.
(b) MATH200A/sample size/binomial, p̂ = .438, E = .02, C-Level = .9 → at least 1665
12
Requirements check:
Random sample, OK.
10×20 = 200, less than the total number of cash deposits, OK.
MATH200A/normality check, r=.9864, CRIT=.9503, r>CRIT, OK.
MATH200a/box-whisker, no outliers, OK.
TInterval L4, 1, .95 → (179.86, 198.93), x̅ = 189.40, s = 20.37, n = 20
You’re 95% confident that the average of all cash deposits is between $179.86 and $198.93.
Common mistake: Don’t say that 95% of deposits are between those values — if you look at the sample you’ll see that’s pre y unlikely. You’re
estimating the average, not the individual deposits in the population.
This is a confidence interval about a proportion, Case 2 in Inferential Statistics: Basic Cases.
13
Requirements check:
Systematic sample, OK.
Sample 10n = 10×1000 = 10,000, less than the number of voters; OK.
520 successes and 1000−520 = 480 failures, OK.
1-PropZInt 520, 1000, .95 → (.48904, .55096), p̂ = .52
With 95% confidence, 48.9% to 55.1% of voters voted Snake. At the 95% confidence level, we can’t tell whether more or less than 50% of voters
voted for Abe Snake.
Problem Set 1 Because this textbook helps you,

please donate at
1. Hypotheses. 2. Significance level RC. Requirements check 3–4. Test statistic and p-value 5. Decision rule (or, conclusion in statistics
1 language) 6. Conclusion (in English)
It keeps you honest. If you could select a significance level after computing the value, you could always get the result you want, regardless of
2 evidence.
Answers will vary here. But you should get in the key idea that If H0 is true, the p-value is the chance of ge ing the sample you got, or a sample
3 even further from H0, purely by random chance. For more correct statements, and common incorrect statements, see What Does the p-Value
Mean? [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c10_pvalue_root]
(a) It’s too wishy-washy. When p<α, you can reach a conclusion. Correction: The accelerant makes a difference, at the 0.05 significance level.
4
(b) You can never prove the null hypothesis of “no difference”. You can’t even say “The accelerant may make no difference,” because that’s only
part of the truth: it equally well may make a difference. You must say something like, “At the 0.05 significance level it’s impossible to say whether the
accelerant makes a difference or not.”
(a) A Type I error is rejecting the null hypothesis when it’s actually true. In this case, a Type I error would be concluding “the accelerant makes
5 paint dry faster” when actually it makes no difference. This would lead you to launch the product and expose yourself to a lot of warranty
claims .
(b) A Type II error is failing to reject the null hypothesis when it’s actually false. In this case, a Type II error would be concluding “the accelerant
doesn’t makes paint dry faster” when actually it does. This would lead you to keep the product off the market even though it could add to your
sales and would perform as promised .
They are not necessarily mistakes. Type I and II errors are an unavoidable part of sample variability [URL:
6 h ps://BrownMath.com/swt/pfswt.htm.htm#c01_ErrorsSampling]. Nothing can prevent them entirely. The only way to make them both less
likely at the same time is to use a larger sample size .
That said, if you make mistakes in data collection or analysis you definitely make Type I or Type II errors (or both of them) more likely.
Make your significance level α smaller. The side effect is making a Type II error more likely .
7
Your own words will vary from mine, but the main difference is that when p > α you can’t reach a conclusion. Accepting H0 is wrong because it
8 reaches the conclusion that H0 is true. Failing to reject H0 is correct because it leaves both possibilities open.
It’s like a jury verdict of “not guilty beyond a reasonable doubt. The jury is not saying the defendant didn’t do it. They are saying that either he
didn’t do it or he did it but the prosecution didn’t present enough evidence to convince them.
A hypothesis test can end up rejecting H0 or failing to reject it, but the result can never be to accept H0.
H0: µ = 500
9 H1: µ ≠ 500
Remark: It must be ≠, not > or <, because the claim is that the mean is 500 minutes, and a difference in either direction would destroy the claim.
(a) p > α; fail to reject H0. At the 0.01 significance level, we can’t determine whether the directors are stealing from the company or not.
10
(b) p < α; reject H0 and accept H1. At the 0.01 level of significance, we find that the directors are stealing from the company.
α is the probability of a Type I error that you can tolerate. A Type I error in this case is determining that the defendant is guilty (calling H0
11 false) when actually he’s innocent (H0 is really true), and the consequence would be pu ing an innocent man to death. You specify a low α to
make it less likely this will happen. Of the given choices, 0.001 is best.
This is binomial data, a Case 2 test of proportion in Inferential Statistics: Basic Cases [URL:
(1) H0: p = .1, 10% of TC3 students driving alcohol impaired

H1: p > .1, more than 10% of TC3 students driving alcohol impaired
(2) α = 0.05
(RC) Systematic sample (counts as random), OK.
npo = 120×.10 = 12 successes and n−npo = 120−12 = 108 failures expected, OK.
10n = 10×120 = 1200, and there are many more students than that at TC3, OK.
(3/4) 1-PropZTest: .1, 18, 120, >po
results: z=1.825741858 → z = 1.83, p=.0339445194 → p = 0.0339 , p̂ = .15
(6) At the 0.05 significance level, more than 10% of TC3 students were alcohol impaired on the most recent Friday or Saturday night when they
drove,
Or,
More than 10% of TC3 students were alcohol impaired on the most recent Friday or Saturday night when they drove (p = 0.0339).
This is binomial data (against or not against): a Case 2 test of population proportion in Inferential Statistics: Basic Cases [URL:
Requirements check: Random sample? NO, this is a self-selected sample, consisting only of those who returned the poll. (That could be overcome by
following up on those who did not return the poll, but nobody did that.)
The 10n≤N requirement also fails. 10n = 10×380 = 3800, much larger than the 1366 population size.
Answer: No, you cannot do any inferential procedure because the requirements are not met.
(a) The sample size is 325 . Why not the 500 she talked to? Because she was studying the habits of the primary grocery shoppers. The 325
14 were members of that population and could therefore be part of her sample; the rest of the 500 were not.
(b) The population is all persons who do the primary grocery shopping in their households . We don’t know the precise number, but it is surely
in the millions since there are millions of households. We can say that it is indefinitely large .
(c) The number 182 is x, the number of successes in the sample .
(d) She wanted to know whether the true proportion is greater than 40%, so her alternative hypothesis is H1: p > 0.4 and po is 0.4 .
(e) No. The researcher is interested in the habits of the primary grocery shoppers in households; therefore she must sample only people who
are primary grocery shoppers in their households. If you even thought about saying Yes, please go back to Chapter 1 and review what bias actually
means.
(a) This is inference about the proportion in one population, Case 2 in Inferential Statistics: Basic Cases [URL:
(1) H0: p = 2/3, the chance of winning is 2/3 if you switch doors.
H1: p ≠ 2/3, the chance of winning is different from 2/3 if you switch doors.
Remark: You need to test for ≠, not <. You’re asked whether the claim of 2/3 is correct, and if it’s wrong it could be wrong in either direction.
It doesn’t ma er that the sample data happen to show a smaller proportion than 2/3.
(2) α = 0.05
(RC) Random sample? Effectively, yes.
Sample less than 10% of population? Yes, 10×30 = 300, and in the long run there would be far more than 300 contestants who switched
doors.
Sample large enough? In a sample of 30, if H0 is true you expect 30×(2/3) = 20 successes and 30−20 = 10 failures, so the answer is yes.
Common mistake: Don’t say “n ≥ 30.” That’s true, but it’s irrelevant for tests of proportions. The sample size of at least 30 is useful
when you’re testing the mean of numeric data.
(3/4) 1-PropZTest, 2/3, 18, 30, ≠
results: z = −.77, p-value = 0.4386 , p̂ = 0.6
(6) We can’t determine whether the claim “switching doors gives a 2/3 chance of winning” is true or false (p = 0.4386).
Or,
At the 0.05 significance level, we can’t determine whether the probability of winning after switching doors is equal to 2/3 or different from
2/3.
Remark: It’s true that you can’t disprove the claim, but it’s also true that you can’t prove it. This is where a confidence interval gives useful
information.
(b) Requirements have already been checked.

1-PropZInt 18, 30, .95. Results: (.4247, .7753), p̂ = .6.
We’re 95% confident that the true probability of winning if you switch doors is between 42.5% and 77.5%.
(c) It’s possible that the true probability of winning if you switch doors is 1/3 (33.3%) or even worse, but it’s very unlikely. Why? You’re 95% confident
that it’s at least 42.5%. Therefore you’re be er than 95% confident that the true probability if you switch is be er than the 1/3 probability if you don’t
switch doors. Switching is extremely likely to be the good strategy.
The null hypothesis is always “no effect”, “nothin’ goin’ on here.” In this case “no effect” is “not spam”, so H0 is “This piece of mail is not
16 spam.”
(a) A Type I error is rejecting the null hypothesis when it’s actually true. Here, a Type I error means deciding a piece of mail is spam when it’s
actually not, so if Heather’s spam filter makes a Type I error then it will delete a piece of real mail. A Type II error is failing to reject H0 when it’s
actually false, treating a piece of spam as real mail, so a Type II error would let a piece of spam mail into Heather’s in-box. .
(b) Most people would rather see a piece of spam (Type II) than miss a piece of real mail (Type I), so a Type I error is more serious in this
situation. Lower significance levels make Type I errors less likely (and Type II errors more likely), so a lower α is appropriate here .
This is a test of one population proportion, Case 2 in Inferential Statistics: Basic Cases [URL:
(1) H0: p = .304

H1: p < .304, less than 30.4% of Ithaca households own cats.
(2) α = 0.05
(RC) Random sample? Systematic, OK.
Sample too large? 10×215 = 2150. Without knowing how many households are in Ithaca, we can be sure it’s more than 2150.
Sample large enough? In a sample of 215, according to H0 you expect 215×.304 ≈ 65 successes and 215−65 = 150 failures, OK.
(3/4) 1-PropZTest .304, 54, 215, <
results: z = −1.68, p-value = 0.0461 , p̂ = 0.2512
(6) At the 0.05 significance level, fewer than 30.4% of Ithaca households own cats.
Or,
Fewer than 30.4% of Ithaca households own cats (p = 0.0461).
Problem Set 2
(a) The population parameter is missing. It should be either µ or p, but since a proportion can’t be greater than 1 it must be µ.
18 Correction: H0: µ = 14.2; H1: µ > 14.2
(b) H0 must have an = sign. Correction: H0: µ = 25; H1: µ > 25
(c) You used sample data in your hypotheses. Correction: H0:µ=750; H1:µ>750
(d) You were supposed to test “makes a difference”, not “is faster than”. Never do a one-tailed test (> or <) unless the other direction is impossible
or of no interest at all. It’s possible that your “accelerant” could actually increase drying time, and if it does you’d definitely want to know.
Correction: H0: µ = 4.3 hr; H1: µ ≠ 4.3 hr
This is numeric data, and you don’t know the standard deviation (SD) of the population. In Inferential Statistics: Basic Cases [URL:
19 h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top] this is Case 1, a test of population mean.
(1) H0: µ = 3.8, the mean pollution this year is no different from last year
H1: µ < 3.8, the mean pollution this year is lower than last year
(2) α = 0.01
(RC) Random sample? Yes.
10n ≤ N? 10×10 = 100, and if there were daily readings the population size is greater than that.
Sample size ≥30? NO!
Normally distributed? MATH200A part 4 yields r=.9784, crit=.9179. r>crit, therefore normal.
Outliers? MATH200A part 2 shows none.
(3/4) T-Test: 3.8, L1, 1, <µo

results: t = −4.749218419 → t = −4.75, p = 5.2266779E−4 → p = 0.0005 , x̅ = 3.21, s = .3928528138 → s = 0.39, n = 10
Common mistake: Don’t write “p = 5.2267” or anything equally silly. A p-value is a probability, and probabilities are never greater than 1.
(6) At the 0.01 level of significance, the mean pollution is lower this year than last year.
Or,
The mean pollution this year is lower than last year (p = 0.0005).
This is numeric data with unknown SD of the population, Case 1 (test of population mean) in Inferential Statistics: Basic Cases [URL:
(1) H0: µ = 32.0, quarts are being properly filled

H1: µ < 32.0, Dairylea is shorting the public
Remark: Your H1 uses <, not ≠, because the problem asks if Dairylea has a legal problem. Yes, they might be overfilling, but that would not
be a legal problem.
(2) α = 0.05. This is just a business situation, not a ma er of life and death. (You could justify a lower α if you can show serious consequences
from making a mistake, such as a multimillion libel suit brought by the company against the investigator.)
(RC) random sample
10×10 = 100 quarts much smaller than total number produced
population is normally distributed, so small sample size is OK
(3/4) T-Test: 32, 31.8, .6, 10, <µo
results: t=−1.054092553 → t = −1.05, p=.159657788 → p = 0.1597
(6) At the 0.05 level of significance, we can’t determine whether Dairylea is giving short volume or not.
Or,
We can’t determine from this sample whether Dairylea is giving short volume or not (p = 0.1597).
Remark: You never accept the null hypothesis. But in many cases you may proceed as though it’s true. Here, since you can’t prove a case
against the dairy, you don’t file charges, make a press release, organize a boyco , etc. You behave exactly as you would behave if you had
proof the dairy was honest.
But you don’t conclude that Dairylea is giving full measure, either. All your hypothesis test tells you is that it could go either way.
This is numeric data with unknown SD of population. You’re testing a population mean, Case 1 in Inferential Statistics: Basic Cases [URL:
(1) H0: µ = 870, no difference in strength

H1: µ ≠ 870, new glue’s average strength is different
Remark: You’re testing different here, not be er. It’s possible that the new glue bonds more poorly, and that would be interesting information,
either guiding further research or perhaps leading to a new product (think Post-It Notes).
(2) α = 0.05
(RC) Random sample.
n = 30.
10×30 = 300, and in principle you could make more than 300 trials.
(3/4) T-Test: 870, 892.2, 56.0, 30, µ≠µo
results: t=2.17132871 → t = 2.17, p=.038229895 → p = 0.0382
(6) At the 0.05 level of significance, new glue has a different mean strength from the company’s best seller. In fact, it is stronger.
Or,
New glue has a different mean strength from the company’s best seller (p = 0.0382). In fact, it is stronger.
Remark: When you are testing ≠, and p<α, you give the two-tailed interpretation “different from”, and then continue with a one-tailed
interpretation. See p < α in Two-Tailed Test: What Does It Tell You? [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c10_h ails_Interp]
This is binomial data (each person either has a bachelor’s or doesn’t) for a one-population test of proportion: Case 2 in Inferential Statistics:
22 Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]
(a) Requirements:
Random sample, yes.
10n = 10×120 = 1200, and there are many more than 1200 residents of Tompkins County aged 25+.
Sample has 52 successes and 120−52 = 68 failures, both ≥ 10, check.
1-PropZInt: x=52, n=120, C-Level=.95

Results: (.34467, .52199); p̂=.4333333333 → p̂ = .4333
We’re 95% confident that 34.5 to 52.2% of Tompkins County residents aged 25+ have at least a bachelor’s degree.
(b) Requirements have already been checked. A two-tailed test at the 0.05 level is equivalent to a confidence interval at the 95% level. The statewide
proportion of 32.8% is outside the 95% CI for Tompkins County, and therefore at the 0.05 significance level, the proportion of bachelor’s degrees
among Tompkins County residents aged 25+ is different from the statewide proportion of 32.8%. In fact, Tompkins County’s proportion is higher.
This is numeric data, with population SD unknown: test of population mean, Case 1 in Inferential Statistics: Basic Cases [URL:
(1) H0: µ = 625, no difference in strength

H1: µ > 625, Whizzo stronger than Stretchie
Remark: Here you test for >, not ≠. Even though Whizzo might be less strong, you don’t care unless it’s stronger.
(2) α = 0.01
(RC) Random sample.
10n = 10×8 = 80, less than Whizzo’s production of bungee cords.
n<30, so test for normality. MATH200A part 4 gives r=.9569, crit=.9054. r>crit, therefore ND.
MATH200A part 2 shows no outliers.
(3/4) T-Test: 625, L1, 1, >µo

results: t=3.232782217 → t = 3.23, p=.0071980854 → p = 0.0072 , x̅ = 675, s=43.74602023 → s = 43.7, n = 8
(6) At the 0.01 level of significance, Whizzo is stronger on average than Stretchie.
Or,
Whizzo is stronger on average than Stretchie (p = 0.0072).
This is numeric data, with σ unknown: test of population mean, Case 1 in Inferential Statistics: Basic Cases [URL:
(1) H0: µ = 6
H1: µ > 6
(2) α = 0.05
(RC) Systematic sample.
n =100 >30.
10n = 10×100 = 1000, less than the number of TC3 students.
(3/4) T-Test: 6, 6.75, 3.3, 100, >µo
results: t=2.272727273 → t = 2.27, p=.0126021499 → p = 0.0126
(6) TC3 students do average more than six hours a week in volunteer work, at the 0.05 level of significance.
Or,
TC3 students do average more than six hours a week in volunteer work (p = 0.0126).
Binomial data (head or tail) implies Case 2, test of population proportion on Inferential Statistics: Basic Cases [URL:
25 h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]. A fair coin has heads 50% likely, or p = 0.5.
(1) H0: p = 0.5, the coin is fair

H1: p ≠ 0.5, the coin is biased
Common mistake: You must test ≠, not >. An unfair coin would produce more or less than 50% heads, not necessarily more than 50%. Yes, this
time he got more than 50% heads, but your hypotheses are never based on your sample data.
(2) α = 0.05
(RC) Random sample? Yes, it’s coin flips.
npo = 10000×.5 = 5000 successes and 10000−5000 = 5000 failures expected.
10n = 10×10,000 = 100,000. It would be possible to flip the coin more than 100,000 times.
(3/4) 1-PropZTest, .5, 5067, 10000, prop≠po
results: z = 1.34, p = .1802454677 → p-value = 0.1802 , p̂ = .5067
(6) At the 0.05 level of significance, we can’t tell whether the coin is fair or biased.
Or,
We can’t determine from this experiment whether the coin is fair or biased (p = 0.1802).
Common mistake: You can’t say that the coin is fair, because that would be accepting H0. You can’t say “there is insufficient evidence to
show that the coin is biased”, because there is also insufficient evidence to show that it’s fair.
Remark: “Fail to reject H0” situations are often emotionally unsatisfying. You want to reach some sort of conclusion, but when p>α you
can’t. What you can do is compute a confidence interval:
1-PropZInt: 5067, 10000, .95
results: (.4969,.5165)
You’re 95% confident that the true proportion of heads for this coin (in the infinity of all possible flips) is 49.69% to 51.65%. So if the coin is
biased at all, it’s not biased by much.
You have numeric data, and you don’t know the SD of the population, so this is a Case 1 test of population mean in Inferential Statistics: Basic
26 Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top].
(a) Check requirements: random sample, n = 45 > 30, and there are more than 10×45 = 450 people with headaches.
TInterval: x̅=18, s=8, n=45, C-Level=.95
Results: (15.597, 20.403)
We’re 95% confident that the average time to relief for all headache sufferers using PainX is 15.6 to 20.4 minutes.
(b) Requirements have already been checked. A two-tailed test (a test for “different”) at the 0.05 level is equivalent to a confidence interval at the
1−0.05 = .95 = 95% confidence level. Since the 95% CI includes 20, the mean time for aspirin, we cannot determine, at the 0.05 significance level,
whether PainX offers headache relief to the average person in a different time than aspirin or not.

please donate at
(a) Use MATH200A part 5 and select 2-pop binomial. You have no prior estimates, so enter 0.5 for p̂1 and p̂2. E is 0.03, and C-Level is 0.95.
1 Answer: you need at least 2135 per sample , 2135 people under 30 and 2135 people aged 30 and older. Here’s what it looks like, using
MATH200A part 5:
Caution! Even if you don’t identify the groups, at least you must say “per sample”. Plain “2135” makes it look like you need only that many people
in the two groups combined, or around 1068 per group, and that is very wrong.
Caution! You must compute this as a two-population case. If you compute a sample size for just one group or the other, you get 1068, which is just
about half of the correct value.
If you don’t have the program, you have to use the formula: [p̂1(1−p̂1)+p̂2(1−p̂2)]·(zα/2/E)². You don’t have any prior
estimates, so p̂1 and p̂2 are both equal to 0.5. Multiply out p̂1 × (1−p̂1) × p̂2 × (1−p̂2) to get .5.
Next, 1−α = 0.95, so α = 0.05 and α/2 = 0.025. zα/2 = z0.025 = invNorm(1−0.025). Divide that by E (.03), square, and
multiply by the result of the computation with the p̂’s.
(b) Using MATH200A Program part 5 with .3, .45, .03, .95 gives 1953 per sample .
Alternative solution: Using the formula, .3(1−.3)+.45(1−.45) = .4575. Multiply by (invNorm(1−.05/2)/.03)² as before to get 1952.74157 → 1953 per
sample.
Again, you must do this as two-population binomial. If you do the under-30 group and the 30+ group separately, you get sample sizes of 897 and
1057, which are way too small. If your samples are that size, the margins of error for under-30 and 30+ will each be 3%, but the margin of error for the
difference, which is what you care about, will be around 4.2%, and that’s greater than the desired 3%.
(a) You have numeric data in two independent samples. You’re testing the difference between the means of two populations, Case 4 in Inferential
2 Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]. (The data aren’t paired because you have no reason to
associate any particular Englishman with any particular Scot.)
(1) Population 1 = English; population 2 = Scots.

H0: µ1 = µ2 (or µ1−µ2 = 0)
H1: µ1 > µ2 (or µ1−µ2 > 0)
(2) α = 0.05
(RC) The problem states that samples were random. For English, r=.9734 and crit=.9054; for Scots, r=.9772 and crit=.9054. Both r’s are greater than
crit, so both are nearly normally distributed. The stacked boxplot shows no outliers. And obviously the samples of 8 are far less than 10% of
the populations of England and Scotland.
(3/4) English numbers in L1, Sco ish numbers in L2.

2-SampTTest with Data; L1, L2, 1, 1, µ1>µ2, Pooled:No
Outputs: t=1.57049305 → t = 1.58, p=.0689957991 → p = 0.0690 , df=13.4634, x̅1=6.54, x̅2=4.85, s1=1.91, s2=2.34, n1=8, n2=8
(6) At the 0.05 level of significance, we can’t say whether English or Scots have a stronger liking for soccer.
Or,
We can’t say whether English or Scots have a stronger liking for soccer (p = 0.0690).
(b) Requirements are already covered.

2-SampTInt, C-Level=.90
Results: (−.2025, 3.5775)
We’re 90% confident that, on a scale from 1=hate to 10=love, the average Englishman likes soccer between 0.2 points less and 3.6 points more than the
average Scot.
(a) This is the difference of proportions in two populations, Case 5 in Inferential Statistics: Basic Cases [URL:
(1) Population 1 = English, population 2 = Scots.

H0: p1 = p2 (or p1−p2 = 0)
H1: p1 ≠ p2 (or p1−p2 ≠ 0)
(2) α = 0.05
(RC) Populations of England and Scotland are greater than 10×150 = 1500 and 10×200 = 2000.
England: 105 successes, 150−105 = 45 failures, both ≥ 10.
Scotland: 160 successes, 200−160 = 40 failures, both ≥ 10.
The samples were stated to be random.
(3/4) 2-PropZTest x1=105, n1=150, x2=160, n2=200, p1≠p2
results: z=−2.159047761 → z = −2.16, p=.030846351 → p = 0.0308 , p̂1 = 0.70, p̂2 = 0.80, p̂ = 0.7571428751
(6) The English and Scots are not equally likely to be soccer fans, at the 0.05 level of significance ; in fact the English are less likely to be soccer
fans.
Or,
The English and Scots are not equally likely to be soccer fans, (p = .0308) ; in fact the English are less likely to be soccer fans.
(b) Requirements already checked.

2-PropZInt with C-Level = .95 → (−.1919, −.0081)
That’s the estimate for p1−p2, English minus Scots. Since that’s negative, English like soccer less than Scots do. With 95% confidence, Scots are more
likely than English to be soccer fans, by 0.8 to 19.2 percentage points.
(c) [(−.0081) − (−.1919)] / 2 = 0.0919 , a li le over 9 percentage points.
(d) MATH200A part 5, 2-pop binomial, p̂1=.7, p̂2=.8, E=.04, C-Level .95 gives 889 per sample
By formula, zα/2 = z0.025 = invNorm(1−0.025) = 1.96.
n1 = n2 = [.7(1−.7)+.8(1−.8)]×(1/96/.04)² = 888.37 → 889 per sample
(a) This is before-and-after paired data, Case 3 in Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top].
4 You’re testing the mean difference.
(1) d = After−Before
H0: µd = 0, running makes no difference in HDL
H1: µd > 0, running increases HDL
Remark: If this was a research study, they would probably test for a difference in HDL, not just an increase. Maybe this study was done by a
fitness center or a running-shoe company. They would want to find an increase, and HDL decreasing or staying the same would be equally
uninteresting to them.
(2) α = 0.05
(RC) Before in L1, After in L2, L3=L2−L1
Random sample.
Five women is obviously less than 10% of all women.
Box-whisker (L3) shows no outliers.
Normality check (L3): r(.9131)>crit(.8804).
(3/4) T-Test 0, L3, 1, µ>0

results: t=3.059874484 → t = 3.06, p=.0188315555 → p = 0.0188 , d̅=4.6, s=3.36, n=5
(6) At the 0.05 level of significance, running 4 miles daily for six months raises HDL level.
Or,
Running 4 miles daily for six months raises HDL level (p = 0.0188).
(b) TInterval with C-Level .9 gives (1.3951, 7.8049).

Interpretation: You are 90% confident that running an average of four miles a day for six months will raise HDL by 1.4 to 7.8 points for the
average woman.
Caution! Don’t write something like “I’m 90% confident that HDL will be 1.4 to 7.8”. The confidence interval is not about the HLD level, it’s
about the change in HDL level.
Remark: Notice the correspondence between hypothesis test and confidence interval. The one-tailed HT at α = 0.05 is equivalent to a two-tailed HT at
α = 0.10, and the complement of that is a CI at 1−α = 0.90 or a 90% confidence level. Since the HT did find a statistically significant effect, you know
that the CI will not include 0. If the HT had failed to find a significant effect, then the CI would have included 0. See Confidence Interval and
Hypothesis Test.
(a) Each participant either had a heart a ack or didn’t, and the doctors were all independent in that respect. This is binomial data. You’re testing
5 the difference in proportions between two populations, Case 5 in Inferential Statistics: Basic Cases [URL:
(1) Population 1: Aspirin takers; population 2: non-aspirin takers.

H0: p1 = p2, taking aspirin makes no difference
H1: p1 ≠ p2, taking aspirin makes a difference
(2) α = 0.001
(RC) SRS.
10n1 = 10×11,037 = 110,370. According to A Census of Actively Licensed Physicians in the United States, 2010 (Young (2011) [see “Sources
Used” at end of book]), in that year there were 850,085 actively licensed physicians in the US. Even if we assume half were women and
there were fewer doctors in 1982 when the study began, still 10n1 is lower. 10n2 = 10×11,034 = 110,340, also within the limit.
Treatment group: 139 successes, 11037−139 = 10898 failures, both ≥ 10.
Placebo group: 239 successes, 11034−239 = 10795 failures, both ≥ 10.
(3/4) 2-PropZTest: x1=139, n1=11037, x2=239, n2=11034, p1≠p2
results: z=−5.19, p-value = 2×10-7 , p̂1 = .0126, p̂2 = .0217, p̂ = .0171

(6) At the 0.001 level of significance, aspirin does make a difference to the likelihood of heart a ack. In fact it reduces it.
Or,
Aspirin makes a difference to the likelihood of heart a ack (p < 0.0001). In fact, aspirin reduces the risk.
Remark The study was conducted from 1982 to 1988 and was stopped early because the results were so dramatic. For a non-technical summary, see
Physicians’ Health Study (2009) [see “Sources Used” at end of book]. More details are in the original article from the New England Journal of Medicine
(Steering Commi ee 1989 [see “Sources Used” at end of book]).
(b) 2-PropZInt with C-Level .95 gives (−.0125, −.0056). We’re 95% confident that 325 mg of aspirin every other day reduces the chance of heart a ack
by 0.56 to 1.25 percentage points.
Caution! You’re estimating the change in heart-a ack risk, not the risk of heart a ack. Saying something like “with aspirin, the risk of heart a ack
is 0.56 to 1.25%” would be very wrong.
(a) You’re estimating the difference in means between two populations. This is Case 4 in Inferential Statistics: Basic Cases [URL:
6 h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]. Requirements:
Random samples (given).
Sample sizes both >30.
10×30 = 300 and 10×32 = 320 are less than the numbers of houses in the two counties.
Population 1 = Cortland County houses, population 2 = Broome County houses.

2-SampTInt, 134296, 44800, 30, 127139, 61200, 32, .95, No
results: (−20004, 34318)
June is 95% confident that the average house in Cortland County costs $20,004 less to $34,318 more than the average house in Broome County.
(b) A 95% confidence interval is the complement of a significance test for ≠ at α = 0.05. Since 0 is in the interval, you know the p-value would be >0.05
and therefore June can’t tell, at the 0.05 significance level, whether there is any difference in average house price in the two counties or not.
If both ends of the interval were positive, that would indicate a difference in averages at the 0.05 level, and you could say Cortland’s average is
higher than Broome’s. Similarly, if both ends were negative you could say Cortland’s average is lower than Broome’s. But as it is, nada.
Remark: Obviously Broome County is cheaper in the sample. But the difference is not great enough to be statistically significant. Maybe the true
mean in Broome really is less than in Cortland; maybe they’re equal; maybe Broome is more expensive. You simply can’t tell from these samples.
The immediate answer is that those are proportions in the sample, not the proportions among all voters. This is two-population binomial data,
7 Case 5 in Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top].
Requirements check:
Random samples, OK.
Each sample 10n = 10×1000 = 10,000. There are far more than 10,000 voters nationally; OK.
The two samples were independent, OK.
Red: 520 successes and 1000−520 = 480 failures, OK.
Blue: 480 successes and 1000−480 = 520 failures, OK.
Population 1 = Red voters, population 2 = Blue voters.

2-PropZInt 520, 1000, 480, 1000, .95
Results: (−.0038, .08379), p̂1=.48, p̂2=.52
With 95% confidence, the Red candidate is somewhere between 0.4 percentage points behind Blue and 8.4 ahead of Blue. The confidence interval
contains 0, and so it’s impossible to say whether either one is leading.
Remark: Newspapers often report the sample proportions p̂1 and p̂2 as though they were population proportions, but now you know that they
aren’t. A different poll might have similar results, or it might have samples going the other way and showing Blue ahead of Red.
(a) For a confidence interval, each sample must have at least 10 successes and at least 10 failures. Sample 1 has only 7 successes. Requirements are
8 not met, and you cannot compute a confidence interval with 2-PropZInt.
(b) For a hypothesis test, we often use “at least 10 successes and 10 failures in each sample” as a shortcut requirements test, but the real requirement
is at least 10 successes and 10 failures expected in each sample, using the blended proportion p̂. If the shortcut procedure fails, you must check the real
requirement. In this problem, the blended proportion is
p̂ = (x1+x2)/(n1+n2) = (7+18)/(28+32) =25/60, about 42%.
For sample 1, with n1 = 28, you would expect 28×25/60 ≈ 11.7 successes and 28−11.7 = 16.3 failures. For sample 2, with n2 = 32, you would expect
32×25/60 ≈ 13.3 successes and 32−13.3 = 18.7 failures. Because all four of these expected numbers are at least 10, it’s valid to compute a p-value using
2-PropZTest.

please donate at
There is no difference. What ma ers in a model is the relative sizes of the predictions for the categories. 40% is 1.6 times 25%, just as 40 is 1.6
1 times 25.
This is a ribute data, one population, more than two possible responses: Case 6, goodness-of-fit, in Inferential Statistics: Basic Cases [URL:
2 h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]. There are 6 categories, therefore 5 degrees of freedom.
(1) H0: The 25:25:20:15:8:7 model for ice cream preference is good.
H1: The 25:25:20:15:8:7 model for ice cream preference is bad.
(2) α = 0.05
(3– Use MATH200A part 6. df=5, χ²=9.68, p-value = 0.0849
4) Here are the input and output data screens:
(If you have MATH200A V6, you’ll see the p-value, degrees of freedom, and χ² test statistic on the same screen as the graph.)
Common mistake: When a model is given in percentages, some students like to convert the observed numbers to percentages. Never do this!
The observed numbers are always actual counts and their total is always the actual sample size.
Remark: You could give the model as decimals, .25, .20, .15 and so on. But for the model, all that ma ers is the relative size of each category
to the others, so it’s simpler to use whole-number ratios.
Common mistake: If you do convert the percentages to decimals, remember that 8% and 7% are 0.08 and 0.07, not 0.8 and 0.7.
(RC) L3 shows the expected counts, and the lowest is 70, so all are ≥5.
The problem says that the 1000 people were a random sample.
There are millions of ice cream lovers, so the sample of 1000 is less than 10% of population.

(6) At the 0.05 level of significance, you can’t say whether the model is good or bad.
Or,
It’s impossible to determine from this sample whether the model is good or bad (p = 0.0849).
Remark: For Case 6 only, you could write your non-conclusion as something like “the model is not inconsistent with the data” or “the data
don’t disprove the model.”
Remark: The χ² test keeps you from jumping to false conclusions. Eyeballing the observed and expected numbers (L2 and L3), you might
think they’re fairly far off and the model must be wrong. Yet the test gives a largish p-value.
Remark: If it had gone the other way — if p was less than α — you would say something like “At the .05 level of significance, the model is
inconsistent with the data” or “the data disprove the model” or simply “the model is wrong”.
Solution: Use Case 7, 2-way table, in Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top].
3
(1) H0: Gun opinion is independent of party
H1: Gun opinion depends on party
(2) α = .05
(3– Put the two rows and three columns in matrix A. (Don’t enter the totals.) Select χ²-Test from the menu. Outputs are χ² = 26.13, df = 2,
4) p=2.118098E-6 → p = 0.000 002 or <.0001.
(RC) The problem states that the sample was random.
With millions of party members, the samples are under 10% of the population.
Check the B matrix and find that the lowest expected count is 106.45. Therefore, all expected counts are above the minimum of 5.
Alternative: use MATH200A part 7 for steps 3–4 and RC.

(5) p < α; reject H0 and accept H1.
(6) At the .05 level of significance, gun opinion depends on party.
Or,
Gun opinion depends on party (p<0.0001).
Remark: “Depends on” does not mean that’s the only factor. But if you don’t like “depends on”, you could say “is not independent of”. Or
you could say, “party affiliation is a factor in a person’s opinion on gun control.”
This is goodness of fit to a model, Case 6 in Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]. Your
4 H0 model is 1:1:1:1:1, or any model with five equal numbers.
(1) H0: Preferences among all first graders are equal.

H1: First graders prefer the five occupations unequally.
(2) α = 0.05
(3/4) MATH200A part 6 with {1,1,1,1,1} or similar in L1 and the observed data in L2. χ²=12.9412 → χ² = 12.94, df = 4, p=.011567 → p = 0.0116 .
There are many, many first graders, far more than 10×425 = 4250.
All L3’s (expected counts) are 85, so all are ≥5.
(6) At the 0.05 significance level, first graders in general have unequal preferences among the five occupations.
Or,
First graders in general have unequal preferences among the five occupations (p = 0.0116).
This is Case 7 in Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top].

5
(1) H0: Egg consumption and age at menarche are independent.
H1: Egg consumption and age at menarche are not independent.
(2) α = 0.01
(3/4) 3×3 in A. Use MATH200A part 7 or χ²-Test
results: χ² = 3.13, df = 4, p-.535967 → p = 0.5360
At a glance, it looks like the sample size is around 100. But it’s obviously less than 10% of the number of women.
Expected values (Matrix B) show one value 4.8148, which is below 5. You can say that it’s just barely below 5, and it’s the only one, so the
requirement is effectively met. That’s true, but it’s also a moot point because of the high p-value.
(6) At the 0.01 level of significance, we can’t determine whether egg consumption and age at menarche are independent or not.
Or,
We can’t determine whether egg consumption and age at menarche are independent or not (p = 0.5360).
Remark: The large p-value makes it really tempting to declare that the two variables are independent. But that would be accepting H0,
which we must never do. It’s always possible that there is a connection and we were just unlucky enough that this particular sample didn’t
show it.
Some researchers would say “There is insufficient evidence to reject the hypothesis of independence.” Strictly speaking, that’s the same
error. However, when the audience is researchers, rather than the non-technical public, it may be understood that they’re not really
accepting H0, only failing to reject it pending the outcome of a further study.
This is a goodness-of-fit problem, Case 6 in Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top].
6
(1) H0: Age distribution of grand jurors matches age distribution of county.
H1: Age distribution of grand jurors does not match age distribution of county.
(2) α = 0.05
(3/4) The county percentages are the model and go in L1. The numbers of jurors (not percentages) go in L2. Reminder: don’t include the total row.
results: χ²=61.2656 → χ² = 61.27, df = 3, p-value = 3.2×10-13 or p < 0.0001
(RC) Because you’re not generalizing, the random-sample rule and the under-10% rule don’t ma er. You need only check that all expected counts
are ≥ 5, and since the lowest is 10.56, the requirements are met.
(6) At the 0.05 significance level, the age distribution of grand jurors is different from the age distribution in the county.
Or,
The age distribution of grand jurors is different from the age distribution in the county (p < 0.0001).
Remark: There are a lot of reasons for this. Judges tend to be older and tend to prefer jurors closer to their own age. Also, older candidates
are more likely to be retired, which means they are less likely to be exempt by reason of their occupation.
This is a 2-way table, specifically a test of independence. Use Case 7 in Inferential Statistics: Basic Cases [URL:
(1) H0: Population size of chosen residence town is independent of population size of town raised in.
H1: Population size of chosen residence town depends on population size of town raised in.
(2) α = 0.05
(3/4) Enter the 3×3 array in Matrix A. (Never enter the totals in a 2-way table hypothesis test.) Use MATH200A part 7 or the calculator’s χ²-Test
menu selection.
results: df = 4, χ² = 35.74, p-value=3.271956E-7 → p-value = 0.000 000 3 or p-value < 0.0001
(RC) Simple random sample: given.
500 men is obviously far below 10% of the total number.
All expected counts (Matrix B) are 14.364 or greater, ≥5.
(6) At the 0.05 significance level, there is an association between the size of town men choose to live in and the size of town they grew up in.
Or,
There is an association between the size of town men choose to live in and the size of town they grew up in (p < 0.0001).
This is a 2-way table, specifically a test of homogeneity. You have seven populations, representing by the seven treatments in the experiment,
8 seven ways to pre-treat and treat a cold. If Echinacea is effective, the proportions of infection from the various treatments should be significantly
different. Use Case 7 in Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top].
(1) H0: The tested treatments with Echinacea make no difference to the proportion who catch cold.
H1: The treatments do make a difference. …
(2) α = 0.01
(3/4) There were seven treatments and two outcomes, so enter your 7×2 matrix and run a χ²-Test or MATH200A
part 7.
Results: χ² = 4.74, df = 6, p-value = 0.5769
Common mistake: Never enter the totals in a two-way test.
(RC) Random sample? Yes, randomized experimental design. ✔

Sample less than 10% of population? Yes, the population of people exposed to the common cold is indefinitely large. ✔
All expected values ≥5? Yes, matrix B shows all values at least 5.6. ✔
(6) At the 0.01 significance level, we can’t determine whether Echinacea is effective against the common cold or not.
Or, We can’t determine whether Echinacea is effective against the common cold or not (p = 0.5769).
Remark: Researchers might write something like “Echinacea made no significant difference to infection rates in our study” with the p-
value or significance level. It’s understood that this does not prove Echinacea ineffective — this particular study fails to reach a conclusion.
But as additional studies continue to find p > α, our confidence in the null hypothesis increases.
Remark: If you used MATH200A part 7, there’s some interesting information in matrix C. The top left 7 rows and 2
columns are the χ² contributions for each of the seven treatments and two outcomes. All are all quite low, in light of
the rule of thumb that only numbers above 4 or so are significant, even at the less stringent 0.05 level.
The last two rows are the total numbers and percentages of people who did and didn’t catch cold: 349 (87.5%)
and 50 (12.5%). If Echinacea is ineffective, you’d expect to see about that same infection rate for each of the seven
treatments. Sure enough, compute the rates from the rows of the data table, and you’ll find that they vary between
81% and 92%.
The third column is the total subjects in each of the seven treatments, and the overall total. Of course you were given those in the data table, but
it’s always a good idea to use this information to check your data entry.
The fourth column is the percentage of subjects who were assigned to each of the seven treatments, totaling 100% of course.
Solutions to Review Problems
Problem Set 1: Short Answers Because this textbook helps you,

please donate at
Write your answer to each question. There’s no work to be shown. Don’t bother with a complete
sentence if you can answer with a word, number, or phrase.
Disjoint events cannot be independent. Why? Disjoint events, by definition, can’t happen on the same trial. That means if A happens, P(B) = 0. But
1 A and B are independent, whether A happens has no effect on the probability of B. With disjoint events, whether A happens does affect the
probability of B. Therefore disjoint events can’t be independent.
(a) C
2 (b) For numeric data with sample size under 30, you check for outliers by making a box-whisker plot and check for normality by making a
normal probability plot.
qualitative = a ribute, non-numeric, categorical. Examples: political party affiliation, gender.

3 quantitative = numeric. Examples: height, number of children.
Common mistake: Binomial is a subtype of qualitative data so it’s not really a synonym. Discrete and continuous are subtypes of numeric data.
equal to 1/6 . The die has no memory: each trial is independent of all the others.
4 The Gambler’s Fallacy is believing that the die is somehow “due for a 6”. The Law of Large Numbers says that in the long run the proportion
of 6’s will tend toward 1/6, but it doesn’t tell us anything at all about any particular roll.
(a) pop. 1 = control, pop 2 = music

5 H0: p2 = p1 and H1: p2 < p1
Or: H0: p2 – p1 = 0 and H1: p2 – p1 < 0
(b) Case 5 , Difference between Two Pop. Proportions; or 2-PropZTest
Common mistake: You must specify which is population 1 and which is population 2.
Common mistake: The data type is binomial: a student is in trouble, or not. There are no means, so µ is incorrect in the hypotheses.
Check this against the definition:

6 Are there a fixed number of trials? Yes, you are rolling five dice, n = 5.
Are there only two outcomes, success and failure? Yes, each die is either a 3 or not.
Is the probability of success the same from trial to trial; are the trials independent? Yes, p = 1/6 for each die, and the dice are independent.
This is a binomial PD.
A
7
Remark: The significance level α is the level of risk of a Type I error that you can live with. If you can live with more risk, you can reach more
conclusions.
B,D — B if p<α, D if p>α

8
“Disjoint” means the same as “mutually exclusive”: two events that can’t happen at the same time. Example: rolling a die and ge ing a 3 or a 6.
9 Complementary events can’t happen at the same time and one or the other must happen. Example: rolling a die and ge ing an odd or an
even. Complementary events are a subtype of disjoint events.
For any set of continuous data, or discrete data with many different values . If the variable is discrete with only a few different answers, you
10 could use a bar graph or an ungrouped histogram.
For a small- to moderate-sized set of numeric data, you might prefer a stemplot.
For mutually exclusive (disjoint) events . Example: if you draw one card from a standard deck, the probability that it is red is ½. The
11 probability that it is a club is ¼. The events are disjoint; therefore the probability that it is red or a club is ½+¼ = ¾.
A, B
12
Remark: C is wrong because “model good” is H0. D is also wrong: every hypothesis test, without exception, compares a p-value to α. For E, df
is number of cells minus 1. F is backward: in every hypothesis test you reject H0 when your sample is very unlikely to have occurred by random
chance.
Continuous data are measurements and answer “how much” questions. Examples: height, salary
13 Discrete data usually count things and answer “how many” questions. Example: number of credit hours carried
C, D
14
Remark: As stated, what you can prove depends partly on your H1. There are three things it could be:
If H1: p > 0.01 (machine producing too many defectives) and you calculate p-value<α, your conclusion is C.
If H1: p ≠ 0.01 (machine not operating as specified) and you calculate p-value<α, your conclusion depends on the sample data. Your conclusion is
either C above or the unlisted conclusion “the machine is producing fewer defectives than allowed”, in other words performing be er than
specified. See p < α in Two-Tailed Test: What Does It Tell You? [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c10_h ails_Interp] for reminders
about interpreting a two-tailed test when p-value<α.
There is li le reason to choose H1: p < 0.01 (the machine is performing be er than specified), but if that is H1 and p-value<α then your conclusion is
that the machine is performing be er than specified; that conclusion is not listed above.
Regardless of H1, if p-value>α your conclusion will be D or similar to it.
Common mistake: Conclusion A is impossible because it’s the null hypothesis and you never accept the null hypothesis.
Conclusion B is also impossible. Why? because “no more than” translates to ≤. But you can’t have ≤ in H1, and H1 is the only hypothesis that can
be accepted (“proved”) in a hypothesis test.
You can’t. You can reduce the likelihood of a Type I error by se ing the significance level α to a lower number, but the possibility of a Type I
15 error is inherent in the sampling process.
Remark: A Type I error is a wrong result, but it is not necessarily the result of a mistake by the experimenter or statistician.
(a) The population, the group you want to know something about, is all churchgoers . Common mistake: Not churchgoers who think
16 evolution should be taught, but all churchgoers. “Churchgoers who think evolution should be taught” is a subgroup of that population, and
you want to know what proportion of the whole population is in that subgroup.
(b) The size is unknown, but certainly in the millions . You also could call it infinite, or uncountable. Common mistake: Don’t confuse size of
population with size of sample. The population size is not the 487 from whom you got surveys, and it’s not the 321 churchgoers in your sample.
(c) The sample size n is the 321 churchgoers from whom you collected surveys. Yes, you collected 487 surveys in all, but you have to disregard the
166 that didn’t come from churchgoers, because they are not your target group. Common mistake: 227 isn’t the sample size either. It’s x, the number
of successes within the sample.
(d) No . You want to know the a itudes of churchgoers, so it is correct sampling technique to include only churchgoers in your sample.
If you wanted to know about Americans in general, then it would be selection bias to include only churchgoers, since they are more likely than
non-churchgoers to oppose teaching evolution in public schools.
In your experiment, there was some difference between the average performance of Drug A and Drug B. The p-value is the chance of ge ing a
17 difference that large or larger if Drug A and Drug B are actually equally effective. Using the number, you can say that if there is no difference
between Drug A and Drug B, then there’s a 6.78% probability of ge ing this big a difference between samples, or even a bigger difference.
Common mistake: Your answer will probably be worded differently from that, but be careful that it is a conditional probability: If H0 is true, then
there’s a p-value chance of ge ing a sample this extreme or more so. The p-value isn’t the chance that H0 is true.
Remark: If you are at all shaky about this, review What Does the p-Value Mean? [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c10_pvalue_root]
(a) There are a fixed n = 100 trials, and the probability of success is p = 0.08 on every trial. This is a binomial distribution.
18 MATH200A part 3, or binomcdf(100,.08,5)
(b) This is a binomial distribution, for exactly the same reasons.

MATH200A part 3, or binompdf(100,.08,5)
(c) The probability of success is p = 0.08 on every trial, but you don’t have a fixed number of trials. This is a geometric distribution.
geometpdf(.08,5)
C
19
Remark: There is no specific claim, so this is not a hypothesis test.
r must be between −1 and +1 inclusive . (Symbolically, −1 ≤ r ≤ +1, or | r | ≤ 1.) A value of r = 0 indicates no linear correlation . But this
20 doesn’t necessarily mean no correlation , because another type of correlation might still be present. Example: the noontime height of the sun
in the sky plo ed against day of the year will show near zero linear correlation but very strong sine-wave correlation.
p̂ , proportion of a sample (In this case, p̂ = 4/5 = 0.8 or 80%.)

21
a ribute or qualitative, specifically binomial (“Are you satisfied with the food service?”)
22
A ribute (qualitative or categorical) data . This compact form of graph makes it easy to compare the relative sizes of all the categories. (A bar
23 graph is also a common choice for qualitative data.)
Caution: The percentages must add to 100%. Therefore you must have complete data on all categories to display a pie chart. Also, if multiple
responses from one subject are allowed, then a pie chart isn’t suitable, and you should use some other presentation, such as a bar graph.
When the data are skewed, prefer the median.
24
Because you can never accept the null hypothesis; only the alternative hypothesis can be accepted.
25
G (Possibly K, depending on your textbook; see below.
26
Remark: This problem tests for several very common mistakes by students. Always make sure that
Your hypotheses include a population parameter (rules out A, E, and I because they have no symbol; rules out B, D, F, H, J, L because their
symbols aren’t a population parameter)
The “=” sign is in H0 (rules out A–D)
This leaves you with G and K as possibilities. Either can be correct, depending on your textbook. The most common practice is always to put a
plain = sign in H0 regardless of H1, which makes G the correct answer. But some textbooks or profs prefer ≤ or ≥ in H0 for one-tailed tests, whch
makes K the correct answer.
C
27
In an experiment, you assign subjects to two or more treatment groups, and through techniques like randomization or matched pairs you
28 control for variables other than the one you’re interested in. By contrast, in an observational study you gather current or past data, with no
element of control; the possibility of lurking variables severely limits the type of conclusions you can draw. In particular, you can’t conclude
anything about causation from an observational study.
1–α , or (1−α)100% is also acceptable.

29
B
30
Remark: The Z-Test is wrong because you don’t know the SD of the selling price of all 2006 Honda Civics in the US. The 1-PropZTest and χ²-
test are for non-numeric data. There is no such thing as a 1-PropTTest.
descriptive: presentation of actual sample measurements

31 inferential: estimate or statement about population made on the basis of sample measurements
Example: “812 of 1000 Americans surveyed said they believe in ghosts” is an example of descriptive statistics: the numbers of yeses and noes in
the sample were counted. “78.8% to 83.6% of Americans believe in ghosts (95% confidence)” is an example of inferential statistics: sample data were
used to make an estimate about the population. “More than 60% of Americans believe in ghosts” is another example of inferential statistics: sample
data were used to test a claim and make a statement about a population.
C
32
Remark: Remember that the confidence interval derives from the central 95% or 90% of the normal distribution. The central 90% is obviously
less wide than the central 95%, so the interval will be less wide.
A sample is a subgroup of the population, specifically the subgroup from which you take measurements. The population is the entire group
33 of interest.
Example: You want to know the average amount of money a full-time TC3 student spends on books in a semester. The population is all full-time
TC3 students. You randomly select a group of students and ask each one how much s/he spent on books this semester. That group is your sample.
D
34
Remark: This is unpaired numeric data, Case 4.
(a) A This is binomial because each respondent was asked “Did you feel strong peer pressure to have sex?” There is one population, high-
35 school seniors, so this is Case 2.
(b) For binomial data, requirements are slightly different between CI and HT. Here you are doing a hypothesis test.
Random sample? ✔
At least 10 successes and 10 failures expected? npo = 500×0.25 = 125, and 500−125 = 375, both >10. ✔
Common mistake: For hypothesis test, you need expected successes and failures. It’s incorrect to use actual successes (150) and failures
(350).
Check that the sample is not too large: 10n = 10×500 = 5000, and far more than 5000 students graduate from US high schools each year. ✔
Common mistake: Some students answer this question with “n > 30”. That’s true, but not relevant here. Sample size 30 is important for numeric data,
not binomial data.

Numeric data, two populations, independent samples with σ unknown: Case 4 (2-SampTTest).
36
Common mistake: You cannot do a 2-SampZTest because you do not know the standard deviations of the two populations.
(1) Population 1 = Judge Judy’s decisions; Population 2 = Judge Wapner’s decisions

H0: µ1 = µ2, no difference in awards
H1: µ1 > µ2, Judge Judy gives higher awards
(2) α = 0.05
(RC) Random sample
Sample sizes are both above 30, so there’s no worry about whether the population data are normal.
(3– 2-SampTTest: x̅1=650, s1=250, n1=32, x̅2=580, s2=260, n2=32, µ1>µ2, Pooled: No
4) Results: t=1.10, p-value = .1383
(6) At the 0.05 level of significance, we can’t tell whether Judge Judy was more friendly to plaintiffs (average award higher than Judge Wapner’s)
or not.
BTW: Some instructors have you do a preliminary F-test. It gives p=0.9089>0.05, so after that test you would use Pooled:Yes in the 2-SampTTest and get p=0.1553.
normalcdf(20.5, 10^99, 14.8, 2.1) = .00332. Then multiply by population size 10,000 to obtain 33.2, or about 33 turkeys .
37
Solution: This is one-population numeric data, and you don’t know the standard deviation of the population: Case 1. Put the data in L1, and
38 1-VarStats L1 tells that x̅ = 4.56, s = 1.34, n = 8.
(1) H0: µ = 4, 4% or less improvement in drying time

H1: µ > 4, be er than 4% decrease in drying time
Remark: Why is a decrease in drying time tested with > and not <? Because the data show the amount of decrease. If there is a decrease, the
amount of decrease will be positive, and you are interested in whether the average decrease is greater than 4 (4%).
(2) α = 0.05
(RC) Effectively a random sample
Normal probability plot (MATH200A part 4) shows a straight line with r(.9803) > CRIT(.9054). Therefore data are ND.
Box-whisker (MATH200A part 2) shows no outliers.
(You don’t have to show these graphs on your exam paper; just show the numeric test for normality and mention that the modified boxplot
shows no outliers.)
(3– T-Test: µo=4, x̅=4.5625, s=1.34…, n=8, µ>µo
4) Results: t = 1.19, p = 0.1370
(6) At the 0.05 significance level, we can’t tell whether the average drying time improved by more than 4% or not.
(b) TInterval: C-Level=.95

Results: (3.4418, 5.6832)
(There’s no need to repeat the requirements check or to write down all the sample statistics again.)
With 95% confidence, the true mean decrease in drying time is between 3.4% and 5.7%.
(a) This is a binomial probability distribution: each rabbit has long hair or not, and the probability for any given rabbit doesn’t change if the
39 previous rabbit had long hair. Use MATH200A part 3.
n = 5, p = 0.28, from = 0, to = 0. Answer: 0.1935
Alternative solution: If you don’t have the program, you can compute the probability that one rabbit has short hair (1−.28 = 0.72), then that all the
rabbits have short hair (0.72^5 = 0.1935), which is the same as the probability that none of the rabbits have long hair.
(b) The complement of “one or more” is none, so you can use the previous answer.
P(one or more) = 1−P(none) = 1−0.1935 = 0.8065
Alternative solution: MATH200A part 3 with n=5, p=.28, from=1, to=5; probability = 0.8065
(c) Again, use MATH200A part 3 to compute binomial probability: n = 5, p = 0.28, from = 4, to = 5. Answer: 0.0238
Alternative solution: If you don’t have the program, do binompdf(5, .28) and store into L3, then sum(L3,5,6) or L3(5)+L3(6) = 0.0238. Avoid the
dreaded off-by-one error! For x=4 and x=5 you want L3(5) and L3(6), not L3(4) and L3(5).
For n=5, P(x≥4) = 1−P(x≤3). So you can also compute the probability as 1−binomcdf(5, .28, 3) = 0.0238.
(d) For this problem you must know the formula:

µ = np = 5×0.28 = 1.4 per li er of 5, on average
This is Case 7, a 2×5 table. (The total row and total column aren’t part of the data.)
40
Common mistake: It might be tempting to do this problem as a goodness-of-fit, Case 6, taking the Others row as the model and the doctors’
choices as the observed values. But that would be wrong. Both the Doctors row and the Others row are experimental data, and both have some
sampling error around the true proportions. If you take the Others row as the model, you’re saying that the true proportions for all non-doctors are
precisely the same as the proportions in this sample. That’s rather unlikely.
(1) H0: Doctors eat different breakfasts in the same proportions as others.
H1: Doctors eat different breakfasts in different proportions from others.
(2) α = 0.05
(3–4) χ²-Test gives χ² = 9.71, df = 4, p=0.0455
(RC) random sample
Matrix B shows that all the expected counts are ≥5.
(As an alternative, you could use MATH200A part 7.)
(6) Yes, doctors do choose breakfast differently from other self-employed professionals, at the 0.05 significance level.
(a) z = (x−µ)/σ ⇒ −1.2 = (x−70)/2.4 ⇒ x = 67.1″

41 or: x = zσ + µ ⇒ x = −1.2×2.4 + 70 = 67.1″
(b) 70−67.6 = 2.4″, and therefore z = −1. By the Empirical Rule, 68% of data lie between z = ±1. Therefore 100−68 = 32% lie outside z = ±1 and 32%/2 =
16% lie below z = −1. Therefore 67.6″ is the 16th percentile .
Alternative solution: Use the big chart [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c03_EmpiricalMasterChart] to add up the proportion of
men below 67.6″ or below z = −1. That is 0.15+2.35+13.5 = 16%.
(c) z = (74.8−70)/2.4 = +2. By the Empirical Rule, 95% of men fall between z = −2 and z = +2, so 5% fall below z = −2 or above z = +2. Half of those, 2.5%,
fall above z = +2, so 100−2.5 = 97.5% fall below z = +2. 97.5% of men are shorter than 74.8″.
Alternative solution: You could also use the big chart [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c03_EmpiricalMasterChart] to find that
P(z > 2) = 2.35+0.15 = 2.5%, and then P(z < 2) = 100−2.5 = 97.5%.
(a) The histogram is shown at left. You must show the scale for both axes and label both
42 axes. The scale for the horizontal axis is predetermined: you label the edges of the
histogram bars and not their centers. You have some latitude for the scale of the vertical
axis, as long as you include zero, show consistent divisions, and have your highest mark
greater than 89. For example, 0 to 100 in increments of 20 would also work.
(b) Compute the class marks or midpoints: 575, 725, and so on. Put them in L1 and the
frequencies in L2. Use 1-VarStats L1,L2 and get n = 219 .
See Summary Numbers on the TI-83 [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c03_StatsTI83].
(c) Further data from 1-VarStats L1,L2: x̅ = 990.1 and s = 167.3
Common mistake: If you answered x̅ = 950 you probably did 1-VarStats L1 instead of
1-VarStats L1,L2. Your calculator depends on you to supply one list when you have a simple list of numbers and two lists when you have a
frequency distribution.
(d) f/n = 29/219 ≈ 0.13 or 13%
The 85th percentile is the speed such that 85% of drivers are going slower and 15% are going faster.
43 invNorm(0.85, 57.6, 5.2) = 62.98945357 → 63.0 mph
(a) This is binomial data (each person either would or would not take the bus), hence Case 2, One population proportion.
44 MATH200A/sample size/binomial: p̂ = .2, E = 0.04, C-Level = 0.90
answer: 271 .
Common mistake: The margin of error is E = 4% = 0.04, not 0.4.
Alternative solution: See Sample Size by Formula [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c09_SS2formula] and

use the formula at right. With the estimated population proportion p̂ = 0.2 in the formula, you get zα/2 = z0.05 =
invNorm(1−0.05) = 1.6449, and n = 270.5543 → 271
(b) If you have no prior estimate, use p̂ = 0.5. The other inputs are the same, and the answer is 423
(a) For the procedure, see Step 1 [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c04_Step1] of

45 Sca erplot, Correlation, and Regression on TI-83/84. Your plot should look like the one at right.
You expect positive correlation because points trend upward to the right (or, because y tends to increase as x
increases). Even before plo ing, you could probably predict a positive correlation because you assume higher
calories come from fat; but you can’t just assume that without running the numbers.
(b) See Step 2 of Sca erplot, Correlation, and Regression on TI-83/84 [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c04_Step2].
r = .8863314629 → r = 0.8862
a = .0586751909 → a = 0.0587
b = −3.440073602 → b = −3.4401
ŷ = 0.0587x − 3.4401
Common mistake: The symbol is ŷ, not y.
(c) The y intercept is −3.4401 . It is the number of grams of fat you expect in the average zero-calorie serving of fast food . Clearly this is not a
meaningful concept.
Remark: Remember that you can’t trust the regression outside the neighborhood of the data points. Here x varies from 130 to 640. The y intercept
occurs at x = 0. That is pre y far outside the neighborhood of the data points, so it’s not surprising that its value is absurd.
(d) See How to Find ŷ from a Regression on TI-83/84 [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c04_yhat_root]. Trace at x = 310 and read off
ŷ = 14.749... ≈ 14.7 grams fat . This is different from the actual data point (x=310, y=25) because ŷ is based on a trend reflecting all the data. It predicts
the average fat content for all 310-calorie fast-food items.
Alternative solution: ŷ = .0586751909(310) − 3.440073602 = 14.749 ≈ 14.7.
(e) The residual at any (x,y) is y−ŷ. At x = 310, y = 25 and ŷ = 14.7 from the previous part. The residual is y−ŷ = 10.3
Remark: If there were multiple data points at x = 310, you would calculate one residual for each point.
(f) From the LinReg(ax+b) output, R² = 0.7855834621 → R² = 0.7856 About 79% of the variation in fat content is associated with variation in calorie
content. The other 21% comes from lurking variables such as protein and carbohydrate count and from sampling error.
(g) See Decision Points for Correlation Coefficient [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c04_decpts_root]. Since 0.8862 is positive and
0.8862 > 0.602, you can say that there is some positive correlation in the population, and higher-calorie fast foods do tend to be higher in fat.
invNorm(1-.06, 2.0, 0.1) = 2.1555, about 2.16 mm

46
This is paired data, Case 3. (Each individual gives you two numbers, Before and After.)
47
(1) d = After − Before
H0: µd = 0, no improvement
H1: µd > 0, improvement in number of sit-ups
Remark: Why After−Before instead of the other way round? Since we expect After to be greater than Before, doing it this way you can expect
the d’s to be mostly positive (if H1 is true). Also, it feels more natural to set things up so that an improvement is a positive number. But if you
do d=Before−After and H1:µd<0, you get the same p-value.
(2) α = 0.01
(RC) Random sample
Enter the seven differences — 1, 4, 0, 6, 7, 12, 1 — into a statistics list. A normal probability plot (MATH200A part 4) shows a straight line
with r(.957) > CRIT(.8978), so the data are normal.
The modified box-whisker plot (MATH200A part 2) shows no outliers.
The plots are shown here for comparison to yours, but you don’t need to copy these plots to an exam paper.
(3– T-Test: µo=0, List:L4, Freq:1, µ>µo

4) Results: t = 2.74, p = 0.0169 , x̅ = 4.4, s = 4.3, n = 7
(6) At the 0.01 significance level, we can’t say whether the physical fitness course improves people’s ability to do sit-ups or not.
(a) normalcdf(-10^99, 24, 27, 4) = .2266272794 → 0.2266 or about a 23% chance

48
(b) normalcdf(-10^99, 24, 27, 4/√5) = .0467662315 → 0.0468 or about a 5% chance
Here you have a model (the US population) and you’re testing an observed sample (Nebraska) for consistency with that model. One tipoff is
49 that you are given the size of the Nebraska sample but for the US you have only percentages, not actual numbers of people. This is Case 6,
goodness of fit to a model.
(1) H0: Nebraska preferences are the same as national proportions.

H1: Nebraska preferences are different from national proportions.
(2) α = 0.05
(3– US percentages in L1, Nebraska observed counts in L2. MATH200A part 6.
4) The result is χ² = 12.0093 → 12.01, df = 4, p-value = 0.0173
Common mistake: Some students convert the Nebraska numbers to percentages and perform a χ² test that way. The χ² test model can equally
well be percentages or whole numbers, but the observed numbers must be actual counts.
(RC) random sample
L3 shows the expected values, and they are all above 5.
(6) Yes, at the 0.05 significance level Nebraska preferences in vacation homes are different from those for the US as a whole.
This is unpaired numeric data, Case 4.

50
(1) Population 1 = Course, Population 2 = No course
H0: µ1 = µ2, no benefit from diabetic course
H1: µ1 < µ2, reduced blood sugar from diabetic course
(2) α = 0.01
(RC) Independent random samples, both n’s >30
(3– 2-SampTTest: x̅1=6.5, s1=.7, n1=50, x̅2=7.1, s2=.9, n2=50, µ1<µ2, Pooled:No
4) Results: t=−3.72, p=1.7E−4 or 0.0002
BTW: Though we do not, some classes use the preliminary 2-SampFTest. That test gives p=0.0816>0.05. Those classes would use Pooled:Yes in 2-SampTTest and
get p=0.00016551 and the same conclusion.

(6) At the 0.01 level of significance, the course in diabetic self-care does lower patients’ blood sugar, on average.
(b) For two-population numeric data, paired data do a good job of controlling for lurking variables. You would test each person’s blood sugar, then
enroll all thirty patients in the course and test their blood sugar six months after the end of the course. Your variable d is blood sugar after the course
minus blood sugar before, and your H1 is µd < 0.
One potential problem is that all 30 patients receive a heightened level of a ention, so you have to worry about the placebo effect. (With the original
experiment, the control group did not receive the extra a ention of being in the course, so any difference from the a ention is accounted for in the
different results between control group and treatment group.)
It seems unlikely that the placebo effect would linger for six months after the end of a short course, but you can’t rule out the possibility. There
are two answers to that. You could re-test the patients after a year, or two years. Or, you could ask whether it really ma ers why patients do be er. If
they do be er because of the course itself, or because of the a ention, either way they’re doing be er. A short course is relatively inexpensive. If it
works, why look a gift horse in the mouth? In fact, medicine is beginning to take advantage of the placebo effect in some treatments.
This is a test on the mean of one population, with population standard deviation unknown: Case 1.
51
(1) H0: µ = 2.5 years
H1: µ > 2.5 years
(2) α = 0.05
(RC) random sample, normal with no outliers (given)
(3–4) T-Test: µo=2.5, x̅=3, s=.5, n=6, µ>µo
Results: t = 2.45, p = 0.0290
(6) Yes, at the 0.05 significance level, the mean duration of pain for all persons with the condition is greater than 2.5 years.
(a) Each man or woman was asked a yes/no question, so you have binomial data for two populations: Case 5.
52
(1) Population 1 = men, Population 2 = women
H0: p1 = p2 men and women equally likely to refuse promotions
H1: p1 > p2 men more likely to refuse promotions
(2) α = 0.05
(RC) independent random samples
For each sample, 10n = 10×200 = 2000 is far less than the total number of men or women.
Men: 60 yes, 200−60 = 140 no; women: 48 yes, 200−48 = 152 no; all are ≥ 10.
(The formal requirement uses the blended proportion p̂ = (60+48)/(200+200) = .27, so men have .27×200 = 54 expected yes and 200−54 =
146 expected no, and women have the same; again, all are ≥ 10.)
(3– 2-PropZTest: x1=60, n1=200, x2=48, n2=200, p1>p2
4) Results: z=1.351474757 → z = 1.35, p=.0882717604 → p-value = .0883 , p̂1=.3, p̂2=.24, p̂=.27
(6) At the 0.05 level of significance, we can’t determine whether the percentage of men who have refused promotions to spend time with their
family is more than, the same as, or less than the percentage of women.
(b) 2-PropZInt with the above inputs and C-Level=.95 gives (−.0268, .14682) . The English sentence needs to state both magnitude and direction,
something like this: Regarding men and women who refused promotion for family reasons, we’re 95% confident that men were between 2.7
percentage points less likely than women, and 14.7 percentage points more likely.
Common mistake: With two-population confidence intervals, you must state the direction of the difference, not just the size of the difference.
This problem depends on the Empirical Rule and knowing that the normal distribution is symmetric.
53 If the middle 95% runs from 70 to 130, then the mean must be µ = (70+130)÷2 → µ = 100
95% of any population are within 2 standard deviations of the mean. The range 70 to 100 (or 100 to 130) is therefore two SD. 2σ = 100−70 = 30 →
σ = 15
This is binomial data, Case 2. (The members of the sample are insurance claims, and each claim either is se led or is not.)
54
(1) H0: p = .75
H1: p < .75
(2) α = 0.05
(RC) random sample
10n = 10×65 = 650, obviously less than the total number of claims filed in the state.
65×0.75 = 48.75 expected successes and 65−48.75 = 16.25 expected failures, both ≥ 10.
Common mistake: Don’t use the actual successes and failures, 40 and 65−40 = 25. That would be right for a confidence interval, but
for a hypothesis test you assume H0 is true and so you must use the proportion 0.75 from your null hypothesis.
(3– 1-PropZTest: po=.75, x=40, n=65, prop<po
4) Results: z=−2.506402059 → z = −2.51, p=.006098358 → p-value = 0.0061 , p̂=.6154
(6) At the 0.05 level of significance, less than 75% of claims do se le within 2 months.
P(mislabeled) = P(Brand A and mislabeled) + P(Brand B and mislabeled) because those are disjoint events. But whether a pair is mislabeled is
55 dependent on the brand, so
P(Brand A and mislabeled) = P(Brand A) × P(mislabeled | Brand A)
and similarly for brand B.
P(mislabeled) = 0.40 × 0.025 + 0.60 × 0.015 = 0.019 or just under 2%
Alternative solution: The formulas can be confusing, and often there’s a way to do without them. You could also do this as a ma er of proportions:
Out of 1000 shoes, 400 are Brand A and 600 are Brand B.
Out of 400 Brand A shoes, 2.5% are mislabeled. 0.025×400 = 10 brand A shoes mislabeled.
Out of 600 Brand B shoes, 1.5% are mislabeled. 0.015×600 = 9 brand B shoes mislabeled.
Out of 1000 shoes, 10 + 9 = 19 are mislabeled. 19/1000 is 1.9% or 0.019.
This is even easier to do if you set up a two-way table, as shown below. The values in bold face are given in the problem, and those in light face are
derived from them.
Brand A Brand B Total
Mislabeled 40% × 2.5% = 1% 60% × 1.5% = 0.9% 1% + 0.9% = 1.9%
Correctly labeled 40% − 1% = 39% 60% − 0.9% = 59.1% 39% + 59.1% = 98.1%
Total 40% 60% 100%
Solution: This is paired numeric data, Case 3.

56
Common mistake: You must do this as paired data. Doing it as unpaired data will not give the correct p-value.
(1) d = A−B
H0: µd = 0, no difference in smoothness
H1: µd ≠ 0, a difference in smoothness
Remark: You must define d as part of your hypotheses.

(2) α = 0.10
(RC) random sample
Compute the ten differences (positive or negative, as shown above) and put them in a statistics list. Use MATH200A part 4 for the normal
probability plot to show data are normal.
MATH200A part 2 gives a modified boxplot showing no outliers.
(3– T-Test: µo=0, List:L3, Freq: 1, µ≠µo
4) Results: t = 1.73, p = 0.1173 , x̅ = 1, s = 1.83, n = 10
(6) At the 0.10 level of significance, it’s impossible to say whether the two brands of razors give equally smooth shaves or not.
The key to this is recognizing the difference between with and without replacement. While (a) and (b) are both technically without
57 replacement, recall that when the sample is less than 5% of a large population, as it is in (a), you treat the sample as drawn with replacement.
But in (b), the sample of two is drawn from a population of only ten bills, so you must use computations for without replacement.
Solution: (a) Use MATH200A part 3 with n=2, p=0.9, from=1, to=1. Answer: 0.18
You could also use binompdf(2, .9, 1) = 0.18.
Alternative solution: The probability that exactly one is tainted is sum of two probabilities: (i) that the first is tainted and the second is not, and (ii)
that the first is not tainted and the second is. Symbolically,
P(exactly one) = P(first and secondC) + P(firstC and second)
P(exactly one) = 0.9×0.1 + 0.1×0.9
P(exactly one) = 0.09 + 0.09 = 0.18
Solution: (b) When sampling without replacement, the probabilities change. You have the same two scenarios — first but not second, and not first
but second — but the numbers are different.
P(exactly one) = P(first and secondC) + P(firstC and second)
P(exactly one) = (9/10)×(1/9) + (1/10)×(9/9)
P(exactly one) = 1/10 + 1/10 = 2/10 = 0.2
Common mistake: Many, many students forget that both possible orders have to be considered: first but not second, and second but not first.
Common mistake: You can’t use binomial distribution in part (b), because when sampling without replacement the probability changes from one
trial to the next.
This is numeric data for one population with σ unknown: Case 1. Requirements are met because the original population (yields per acre) is
58 normal. The T-Interval yields (80.952, 90.048). 81.0 < µ < 90.0 (90% confidence) or 85.5±4.5 (90% confidence)
No, because the probabilities on the five trials are not independent.
59 For example, if the first card is an ace then the probability the second card is also an ace is 3/51, but if the first card is not an ace then the
probability that the second card is an ace is 4/51. Symbolically, P(A2|A1) = 3/51 but P(A2| not A1) = 4/51.
This is two-population binomial data, Case 5.
60 (a) p̂T = 128/300 = 0.4267. p̂C = 135/400 = 0.3375. p̂T−p̂C = 0.0892 or about 8.9%
Remark: The point estimate is descriptive statistics, and requirements don’t enter into it. But the confidence interval is inferential statistics, so you
must verify that each sample is random, each sample has at least 10 successes and 10 failures, and each sample is less than 10% of the population it
came from.
The problem states that the samples were random, which takes care of the first requirement. There were 128 successes and 300−128 = 172 failures
in Tompkins, 135 successes and 400−135 = 265 failures in Cortland, so the second reqirement is met.
What about the third requirement? You don’t know the populations of the counties, but remember that you can work it backwards. 10×300 =
3000 (Tompkins) and 10×400 = 4000 (Cortland), and surely the two counties must have populations greater than 3000 and 4000, so the third
requirement must be met.
(b) 2-PropZInt: The 98% confidence interval is 0.0029 to 0.1754 (about 0.3% to 17.5%), meaning that with 98% confidence Tompkins viewers are more
likely than Cortland viewers, by 0.3 to 17.5 percentage points, to prefer a movie over TV.
(c) E = 0.1754−0.0892 = 0.0862 or about 8.6%

You could also compute it as 0.0892−0.0029 = 0.0863 or (0.1754−0.0029)/2 = 0.0853. All three methods get the same answer except for a rounding
difference.
This is binomial data for two populations, Case 5. (The members of the samples are seeds, and a given seed either germinated or didn’t.) Note:
61 sample sizes are 80+20 = 100 and 135+15 = 150.
(1) Population 1 = no treatment, Population 2 = special treatment

H0 p1 = p2, no difference in germination rates
H1 p1 ≠ p2, there’s a difference in germination rates
(2) α = 0.05
(RC) independent random samples
10n1=10×100=1000; 10n2=10×150=2000; obviously there are far more than 3000 seeds of this type.
In sample 1, 80 successes and 20 failures; in sample 2, 135 successes and 15 failures; all are at least 10.
(The formal requirement uses the blended proportion p̂ = (80+135)/(100+150) = 0.86 to find expected successes and failures. For sample
1, 0.86×100 = 86 and 100−86 = 14; for sample 2, 0.86×150 = 129 and 150−129 = 21. All are at least 10.)
(3– 2-PropZTest: x1=80, n1=80+20, x2=135, n2=135+15, p1≠p2
4) Results: z = −2.23, p-value = 0.0256 , p̂1 = .8, p̂2 = .9, p̂ = .86
(6) Yes, at the 0.05 significance level, the special treatment made a difference in germination rate. Specifically, seeds with the special treatment
were more likely to germinate than seeds that were not treated.
Remark: p < α in Two-Tailed Test: What Does It Tell You? [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c10_h ails_Interp] explains
how you can reach a one-tailed result from a two-tailed test.
Alternative solution: You could also do this as a test of homogeneity, Case 7. The χ²-Test gives χ² = 4.98, df = 1, p=0.0256
Reference Material

Updated 5 Nov 2020
Relational Symbols
= equals ≠ is not equal to

is the same as is different from
> is greater than ≥ is greater than or equal to

is more than or >= is at least
exceeds is not less than
is above
< is less than ≤ is less than or equal to

is fewer than or <= is at most
is below does not exceed
is not greater than
is no more than
A<x<B x is between A and B, exclusive
A≤x≤B x is between A and B, inclusive
A≈B A is approximately equal to B
Here are symbols for various sample statistics and the corresponding population parameters. They are not repeated in the list below.
sample population
description
statistic parameter
n N number of members of sample or population
x̅ “x-bar” µ “mu”
mean
or µx
M or Med (none)
median
or x̃ “x-tilde”
s σ “sigma” standard deviation

(TIs say Sx) or σx For variance, apply a squared symbol (s² or σ²).
r ρ “rho” coefficient of linear correlation
p̂ “p-hat” p proportion
z t χ² (n/a) calculated test statistic
µ and σ can take subscripts to show what you are taking the mean or standard deviation of. For instance, σx̅ (“sigma sub x-bar”) is the standard
deviation of sample means, or standard error of the mean.
Roman Letters
b = y intercept of a line. Defined here in Chapter 4. (Some statistics books use b0.)
BD or BPD = binomial probability distribution. Defined here in Chapter 6.
CI = confidence interval. Defined here in Chapter 9.
CLT = Central Limit Theorem. Defined here in Chapter 8.
d = difference between paired data. Defined here in Chapter 11.
df or ν “nu” = degrees of freedom in a Student’s t or χ² distribution. Defined here in Chapter 9. Defined here in Chapter 12.
DPD = discrete probability distribution. Defined here in Chapter 6.
E = margin of error, a/k/a maximum error of the estimate. Defined here in Chapter 9.
f = frequency. Defined here in Chapter 2.
f/n = relative frequency. Defined here in Chapter 2.
HT = hypothesis test. Defined here in Chapter 10.
Ho = null hypothesis. Defined here in Chapter 10.
H1 or Ha = alternative hypothesis. Defined here in Chapter 10.
IQR = interquartile range, Q3−Q1. Defined here in Chapter 3.
m = slope of a line. Defined here in Chapter 4. (The TI-83 uses a and some statistics books use b1.)
M or Med = median of a sample. Defined here in Chapter 3.
n = sample size, number of data points. Defined here in Chapter 2. Also, number of trials in a probability experiment with a binomial model.
Defined here in Chapter 6.
N = population size.
ND = normal distribution, whose graph is a bell-shaped curve; also “normally distributed”. Defined here in Chapter 7.
p = probability value. The specific meaning depends on context.

In geometric and binomial probability distributions, p is the probability of “success” (defined here in Chapter 6) on any one trial and
q = (1−p) is the probability of “failure” (the only other possibility) on any one trial.
In hypothesis testing, p is the calculated p-value (defined here in Chapter 10), the probability that rejecting the null hypothesis would be
a wrong decision.
In tests of population proportions, p stands for population proportion and p̂ for sample proportion (see table above).
P(A) = the probability of event A.
P(AC) or P(not A) = the probability that A does not happen. Defined here in Chapter 5.
P(B | A) = the probability that event B will happen, given that event A definitely happens. It’s usually read as the probability of B given A.
Defined here in Chapter 5.
Caution! The order of A and B may seem backward to you at first.
P80 or P80 = 80th percentile (Pk or Pk = k-th percentile) Defined here in Chapter 3.
q = probability of failure on any one trial in binomial or geometric distribution, equal to (1−p) where p is the probability of success on any
one trial. Defined here in Chapter 6.
Q1 or Q1 = first quartile (Q3 or Q3 = third quartile) Defined here in Chapter 3.
r = linear correlation coefficient of a sample. Defined here in Chapter 4.
R² = coefficient of determination. Defined here in Chapter 4.
s = standard deviation of a sample. Defined here in Chapter 3.
SD (or s.d.) = standard deviation. Defined here in Chapter 3.
SEM = standard error of the mean (symbol is σx̅). Defined here in Chapter 8.
SEP = standard error of the proportion (symbol is σp̂). Defined here in Chapter 8.
X (capital X) = a variable.
x (lower-case x) = one data value (“raw score”). As a column heading, x means a series of data values.
x̅ “x-bar” = mean of a sample. Defined here in Chapter 3.
x̃ “x-tilde” = median of a sample. Defined here in Chapter 3.
ŷ “y-hat” = predicted average y value for a given x, found by using the regression equation. Defined here in Chapter 4.
z = standard score or z-score. Defined here in Chapter 3.
z(area) or zarea = the z-score, such that that much of the area under the normal curve lies to the right of that z. This is not a multiplication!
(See The z Function [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c07_zFunc].)
Greek Letters
α “alpha” = significance level in hypothesis test, or acceptable probability of a Type I error (probability you can live with). Defined here in
Chapter 10. 1−α = confidence level.
β “beta” = in a hypothesis test, the acceptable probability of a Type II error; 1−β is called the power of the test.
µ mu, pronounced “mew” = mean of a population. Defined here in Chapter 3.
ν nu: see df, above.
ρ rho, pronounced “roe” = linear correlation coefficient of a population.
σ “sigma” = standard deviation of a population. Defined here in Chapter 3.
σx̅ “sigma-sub-x-bar”; see SEM above.
σp̂ “sigma-sub-p-hat”; see SEP above.
∑ “sigma” = summation. (This is upper-case sigma. Lower-case sigma, σ, means standard deviation of a population; see the table near the
start of this page.) See ∑ Means Add ’em Up [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#c01_BigSigma] in Chapter 1.
χ² “chi-squared” = distribution for multinomial experiments and contingency tables. Defined here in Chapter 12.
Inferential Statistics: Basic Cases

Updated 29 Dec 2014
Summary: This table organizes procedures for basic inferential statistics into a chart of Cases. (The Case numbers are useful for reference,
although they are not standard statistics terminology.) All links are live in the online version of this page [URL:
See also: While you’re ge ing used to the cases, practice with the interactive Triage: Which Inferential Stats Case Should I Use? [URL:
h ps://BrownMath.com/stat/castriag.htm]
For more cases see Inferential Statistics Cases [URL: h ps://BrownMath.com/stat/cases.htm].
TI-83/84/89 Procedures
Case Number Pop. (CI=conf int, HT=hypothesis test, SS=sample size)
and Description Param. All require random sample or randomization, and 10n ≤ N;
additional requirements are noted in each case.
NUMERIC DATA — INFERENCES ABOUT MEANS

1 One pop. mean, unknown σ µ Required: n ≥ about 30 or normal with no outliers.
CI: TInterval; HT: T-Test
SS: MATH200A part 5
3 Mean difference for paired data µd Required: n ≥ about 30, or differences are normal with no outliers.
CI: TInterval using the differences; HT: T-Test using the differences
4 Difference of 2 indep. pop. means, µ1, µ2 or Required: in each sample, n ≥ about 30 or normal with no outliers.
unpaired data µ1−µ2 CI: 2-SampTInt; HT: 2-SampTTest
YES/NO COUNTS — INFERENCES ABOUT PROPORTIONS

2 One pop. proportion p CI: 1-PropZInt; requires ≥ 10 successes and ≥ 10 failures in sample.
HT: 1-PropZTest; requires expected successes npo ≥ 10 and expected failures n−npo ≥ 10.
HT w/ small samples: MATH200A part 3 or binomcdf
SS: MATH200A part 5
5 Difference of 2 pop. proportions p1, p2 CI: 2-PropZInt; each sample requires ≥ 10 successes and ≥ 10 failures.
p1−p2 HT: 2-PropZTest; requires same as CI. If that’s not met, use pooled p̂ to test that n1p̂,
n1−n1p̂, n2p̂, n2−n2p̂ are all ≥ 10.
SS: MATH200A part 5
CATEGORY COUNTS — INFERENCES ABOUT MODELS

6 Goodness of fit (GOF) or multinomial none CI: undefined; HT: MATH200A part 6
TI-89 HT: Chi2 GOF or How to Test Goodness of Fit on TI-89
Required: every expected value ≥ about 5.
7 Independence or homogeneity none CI: undefined; HT: χ²-Test or MATH200A part 7

(in 2-way table) TI-89 HT: Chi2 2-way
Required: every expected value ≥ about 5.

Updated 3 Nov 2020
Summary: This is your reference sheet for the steps of hypothesis tests. For all the details, see Chapters 10 through 12 of this book.
Advice: Always number your steps. That helps others find the key features of your test, and you don’t forget any steps.
See also: Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top]

Top 10 Mistakes of Hypothesis Tests
Step 1. Hypotheses
Following are pa erns for your hypotheses in the cases covered in the text. With Cases 1 through 5, if you can say anything meaningful about the
consequences if each hypothesis is true, add that.
Bad example (adds li le or nothing to the symbols):

H0: µ = 67.6, average 2-liter bo le contains 67.6 fl oz
H1: µ < 67.6, average 2-liter bo le contains less than 67.6 fl oz
Good example (explains the implications):

H0: µ = 67.6, average bo le filled properly
H1: µ < 67.6, average bo le is underfilled
In Cases 1 through 5, a test for < or > is called a one-tailed test, and a test for ≠ is called a two-tailed test. Please see One-Tailed or Two-Tailed? [URL:
h ps://BrownMath.com/swt/pfswt.htm.htm#c10_h ails_root] for advice on choosing between them.
Case 1: (Testing mean of one population against a number called µo)

H0: µ = number
H1: µ < number or µ ≠ number or µ > number
Case 2: (Testing proportion in one population against a number called po)

H0: p = number
H1: p < number or p ≠ number or p > number
Case 3: (Testing mean difference (paired data))

d = _____ − _____
H0: µd = 0
H1: µd < 0 or µd ≠ 0 or µd > 0
Case 4: (Testing difference of independent means)

pop. 1 = _____, pop. 2 = _____
H0: µ1 = µ2
H1: µ1 < µ2 or µ1 ≠ µ2 or µ1 > µ2
Case 5: (Testing difference of population proportions)

pop. 1 = _____, pop. 2 = _____
H0: p1 = p2
H1: p1 < p2 or p1 ≠ p2 or p1 > p2
Case 6: (Testing goodness of fit)

H0: The _____ model is consistent with the data.
H1: The model is not consistent with the data.
Case 7: (Testing independence)

H0: _____ and _____ are independent.
H1: _____ and _____ are dependent.
Case 7: (Testing homogeneity)

H0: The proportions are all equal.
H1: Some proportions are different from others.
Step 2. Significance Level
Short and sweet:
α = _____
Step RC. Requirements Check
Please see Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top] for specific requirements. For Cases 6 and 7,
it’s easier to check requirements if you move this step after Steps 3/4.
Steps 3/4. Computations
Show screen name. Example: T-Test. You don’t need to write down keystrokes, such as “STAT TESTS 2”.
Show all inputs.
Show new outputs, meaning any that weren’t on the input screen.
Step 5. Conclusion (Statistics Language)
No room for creativity here. Write down whichever one of these applies:
p < α. Reject H0 and accept H1.
p > α. Fail to reject H0.
Step 6. Conclusion (English)
Here you have a lot of latitude as long as you state the correct conclusion in English and give the significance level or p-value, or both.
If you rejected H0, state H1 without doubting words like may or could. Examples:
At the 0.05 significance level, the average 2-liter bo le contains less than 67.6 fl oz. Drinkems is underfilling the bo les.
Or,
The average 2-liter bo le contains less than 67.6 fl oz. Drinkems is underfilling the bo les (p = 0.0246).
If you failed to reject H0, state your non-conclusion in neutral language, using phrases like can’t determine whether or it’s impossible to say whether.
Examples:
At the 0.05 significance level, we can’t tell whether Drinkems is underfilling the bo les or not.
Or,
We can’t tell whether Drinkems is underfilling the bo les or not (p = 0.1045).
Big Names in Statistics

Updated 16 Oct 2014
Dad sighed. “Kip, do you think that table was brought down from on high by an archangel?”
Robert A. Heinlein, in Have Space Suit—Will Travel (1958)
Math and science books tend to spend all their time teaching you the concepts, but the fact that these concepts were invented by people gets lost.
Here’s a rundown of the people who invented most of the concepts that you meet in a first course in statistics. (I’ve made extensive use of Upton &
Cook 2008 [see “Sources Used” at end of book] in preparing this page.)
Don’t panic! While I think it’s nice if you know that these people existed, you won’t be quizzed on who invented what.
See also: You might like to visit Figures from the History of Probability and Statistics (Aldrich 2012 [see “Sources Used” at end of book]). Aldrich
puts these people, and lots more, in a time sequence and shows pictures of many of them.
Bernoulli, Jacob or Jacques (1654–1705), Swiss

Member of a famous mathematical family. Formulated the law of large numbers in 1689. Bernoulli trials are named after him, and he
developed the binomial distribution, though this work was not published till eight years after his death.
Bradford Hill, Sir Austin (1897–1991), English

With Sir Richard Doll, published the first paper linking smoking and lung cancer, Smoking and Carcinoma of the Lung [URL
h p://www.ncbi.nlm.nih.gov/pmc/articles/PMC2038856/ accessed 2014-02-09] (1950). Is best known for the 1965 paper The Environment and
Disease: Association or Causation? [URL h p://www.edwardtufte.com/tufte/hill accessed 2014-02-09] In it, he laid out nine criteria for
confirming when A really is a cause of B and not merely associated. For a short summary of the criteria, see Wikipedia’s Bradford Hill
criteria [URL h ps://en.wikipedia.org/wiki/Bradford-Hill_criteria accessed 2014-02-09] or Steve Simon’s Causation [URL
h p://www.pmean.com/00/causation.html accessed 2014-02-09].
Fisher, Sir Ronald Aylmer (1890–1962), English

Referred to as R. A. Fisher. Coined the term variance and assigned it the symbol σ² in 1918. Developed the ANOVA procedure for testing
equality of three or more means in 1918, and the F distribution in 1922. Published the Fisher z distribution (different from the normal
distribution) in 1924 in a paper called “On a Distribution Yielding the Error Functions of Several Well-Known Statistics”; among other uses,
it determines the confidence interval for the correlation coefficient of a population. Gave the mathematical derivation of Gosset’s t
distribution in 1925. “Virtually invented the subject of experimental design” (Upton & Cook) and brought out editions of Statistical Methods
for Research Workers every few years from 1925 until his death.
Gauss, Johann Carl Friedrich (1777–1855), German

Hugely prolific mathematician and astronomer. Said by Isaac Asimov to be the last mathematician to publish papers in Latin. Published in
1823 a paper with the resounding title “Theoria Combinationis Observationum Erroribus Minimis Obnoxiae”, giving the theory of least
squares regression. As part of this work, he had to find an appropriate probability distribution for the errors, which we now know as the
normal distribution.
Gosset, William Sealy (1876–1937), English

In 1908, while working for Guinness, he tested small samples of the product and realized that existing statistical theory of small samples was
wrong. He then discovered and named the t distribution, which is followed by the means of samples where the standard deviation of the
population is unknown. (R. A. Fisher later derived that distribution mathematically.) Because company policy did not allow him to publish
under his own name, he used the pseudonym “Student”, and the t distribution is still known as “Student’s t” because of this.
Laplace, Pierre-Simon, Marquis de (1749 1827), French

Mathematician and mathematical physicist. Ennobled as a count by Napoleon in 1806, then created a marquis by Louis XVIII in 1817.
Developed the Central Limit Theorem in Théorie analytique des probabilités (1812).
Neyman, Jerzy (1894–1981), Polish American

Studied under Karl Pearson. With Egon Pearson, published the 1933 paper that set out the standard method of testing hypotheses.
Pearson, Egon Sharpe (1895–1980), English

Son of Karl Pearson. With Jerzy Neyman, invented the standard approach to hypothesis testing, with null and alternative hypotheses. They
published their paper, “On the Problem of the Most Efficient Tests of Statistical Hypotheses”, in Philosophical Transactions of the Royal Society
in 1933.
Pearson, Karl (1857–1936), English

Invented the term histogram in lectures some time before 1895. Named the standard deviation in 1893, and gave the population standard
deviation the symbol σ in 1894. Defined the correlation coefficient in a paper published in 1896. Devised the χ² goodness-of-fit test in 1900.
Tukey, John Wilder (1915–2000), American

Coined the word “bit” in 1946 and the word “software” in 1958. Introduced the stem-and-leaf plot and the box-whisker diagram in his book
Exploratory Data Analysis in 1970. Invented the Honestly Significant Difference test [URL: h ps://BrownMath.com/stat/anova1.htm#HSD] to
be done after an ANOVA. He wrote this up but circulated it informally from 1953 till it was finally published in J. W. Tukey’s Collected Works
during the middle 1980s and early 1990s.
Recommended Statistics Books

Updated 27 Nov 2015
Statistics for Citizens
For more practical applications, wri en in a non-technical way, I recommend:
Lewis, H.W. (1997). Why Flip a Coin? John Wiley & Sons.
Lots of illustrations of how thinking in a statistical way can help you make decisions. Applications include voting, gambling, war, and
the stock market. A very enjoyable read.
Mlodinow, Leonard. (2008). The Drunkard’s Walk: How Randomness Rules Our Lives. Pantheon.
If you’re like most people, your intuitions about anything having to do with probability are usually wrong. Without using formulas,
Mlodinow gets you to think more clearly about probability, with zillions of real-world examples from all areas of everyday life.
Malkiel, Burton G. (2003). A Random Walk down Wall Street. W.W. Norton & Company.
Malkiel has done exhaustive statistical analysis of the stock market, to help you make wise decisions. He doesn’t just tell you what to
do, he shows you the statistical evidence and explains what the statistics mean.
You’ll probably find a later edition; Malkiel updates the book every couple of years.
Vickers, Andrew (2010). What Is a p-Value Anyway? 34 Stories to Help You Actually Understand Statistics. Addison-Wesley.
Statistics really is stories, and these stories are short and fun, but each with a point behind it. Stats can seem like a bunch of formulas,
but really it’s about learning to think logically, and Vickers does a great job of leading you to that way of thinking.
Dewdney, A.K. (1993). 200% of Nothing. John Wiley & Sons.

An excellent and highly readable tour through probability. Dewdney presents lots of situations, many from advertising, and helps you
see how to use statistical thinking (educated common sense, really) to avoid being taken in.
The book runs out of steam toward the end, but the first nine or ten chapters are excellent — I particularly recommend “The Great Pepsi
Challenge” (page 24), the lo ery discussion (pages 56–59), and the decision whether to buy a store’s extended warranty (page 91).
Gigerenzer, Gerd. (2002). Calculated Risks. Simon & Schuster.

For ordinary people making legal or medical decisions. Some applications include AIDS counseling and DNA evidence. A very
enjoyable read.
La ko, William J., and David M. Saunders. (1995). Four Days with Dr. Deming. Addison-Wesley.
Dr. W. Edwards Deming is famous for teaching first Japanese and then American businesses to apply statistical methods to
management, especially to quality. This book is in the form of a CEO’s “thought journal” at one of Dr. Deming’s seminars, as he finds that
pre y much everything he knew about managing his company is wrong. It’s kept lively with lots of pictures and conversations, plus of
course quotes from Dr. Deming’s lectures. You’ll want to read this book slowly and let the ideas expand in your mind.
“It’s so simple,” says Dr. Deming, and he’s right. You don’t actually need a statistical background to understand this book, but you’ll
recognize that the key ideas come from our week 9, sample variability.
If you can’t find these in your library for regular checkout, ask a librarian to get them from another library for you, or get them from a bookseller.
Textbooks
The best textbook I’ve seen is DeVeaux, Velleman, and Bock’s Intro Stats (Pearson Addison Wesley, 2009). It’s wri en in a breezy, conversational
style, perfect for self study. I really like the many sections “What can go wrong?” because you should always have possible pitfalls in mind.
Another excellent textbook is Freedman, Pisani, and Purves, Statistics (Norton, 2007). There’s not as much eye candy — no color at all, for
instance — but it says what needs to be said and spends adequate time on the philosophy behind the methods.
If you’d like something a li le less formal than a textbook, I recommend Larry Gonick & Woollco Smith, The Cartoon Guide to Statistics.
(HarperPerennial, 1993, ISBN 0-06-273102-5). Despite its lighthearted appearance, this is actually a pre y good statistics book. Its advantages include
high readability, brief explanations, and low cost (under $12 new at Amazon in March 2004). On the down side, it presents things in a different order
from our course, it doesn’t cover data types or χ², and you need to look elsewhere for practice problems. Still I think it’s great value for the money.
TI-83/84 Cheat Sheet

Updated 17 Nov 2020
Summary: In this course, you used a lot of calculator procedures. This cheat sheet brings them all together. You’ll find just the key points here, but
each section links back to the original full discussion, complete with screen shots.
(You can use any lists, not just L1 and L2.)
Ask your instructor whether you can use this sheet during tests.
Contents: Sampling
Seeding the Random Numbers
Taking a Random Sample
Stats of a Plain List of Numbers
Stats of an Ungrouped Distribution of Numbers with Frequencies
Weighted Average
Stats of a Grouped Distribution of Number Ranges with Frequencies
Five-Number Summary
Box-Whisker Diagram a/k/a Boxplot
Sca erplot
Find r, R², and Line of Best Fit
Predict Average y for an x
Plot the Residuals (optional)
Mean and SD of a Discrete PD
Probabilities of Geometric Distribution
Probabilities of Binomial Distribution
Normal Distribution
Have Boundary(ies), Need Probability or Area
Have Probability or Area, Need Boundary(ies)
Does a Data Set Fit the Normal Model?
Sampling Distribution of the Mean
Sampling Distribution of the Proportion
Confidence Intervals, Hypothesis Tests, Sample Size
Sampling
Seeding the Random Numbers
Pick a number haphazardly and type it into the calculator, then [STO] and [MATH] » PROB » rand.
Details: Seeding the Random-Number Generator in Chapter 1.
Taking a Random Sample
randInt(1,SizeOfPopulation), then press [ENTER] until you have as many distinct numbers as you need for your sample, ignoring duplicates.
Details: Selecting Members of the Sample in Chapter 1.
Pick a number k, where you’ll be taking data from every kth individual. Then randInt(1,k) using your chosen number k.
Details: Taking a Systematic Sample in Chapter 1.
For all of these, remember to use σ or s based on whether you have the whole population or just a sample.
Stats of a Plain List of Numbers
Enter numbers in L1, then 1-VarStats L1. Check n before you look at anything else.
Details: from a List of Numbers in Chapter 3.
Stats of an Ungrouped Distribution of Numbers with Frequencies
Enter the values in L1 and the frequencies in L2, then 1-VarStats L1,L2. Check n before you look at anything else.
Details: from an Ungrouped Distribution in Chapter 3.
Weighted Average
Enter the values in L1 and the weights in L2, then 1-VarStats L1,L2. Check n before you look at anything else — it should equal the total of the
weights.
Details: Weighted Average in Chapter 3.
Stats of a Grouped Distribution of Number Ranges with Frequencies
Find class midpoints and enter them in L1; enter frequencies in L2. 1-VarStats L1,L2. Before looking at anything else, verify that n is total sample
size, not number of classes.
Details: from a Grouped Distribution in Chapter 3.
Caution: If classes are 100–199, 200–299, …, then class midpoints are 150, 250, … (not 149.5, 249.5, …).
Five-Number Summary
Use the procedure for stats of a plain list of numbers or an ungrouped distribution of Numbers with Frequencies, above. Scroll down to the second
screen.
Caution: The five-number summary is not meaningful with a grouped distribution.
Box-Whisker Diagram a/k/a Boxplot
Enter the numbers in one list. If you have frequencies, enter them in a second list. Use MATH200A Program part 2. If you don’t have the program, see
Box-Whisker Plots on TI-83/84 [URL: h ps://BrownMath.com/ti83/boxplot.htm].
Details: Box-Whisker Diagrams in Chapter 3.
Caution: The boxplot is not meaningful with a grouped distribution.
Scatterplot
x’s in L1, y’s in L2. Turn off all plots on the [Y=] screen, then [2nd Y= makes STAT PLOT] [1] [ENTER] [▼] [ENTER]. Specify lists and the mark for plo ing,
then [ZOOM] [9].
Details: Step 1. Make the Sca erplot in Chapter 4.
Find r, R², and Line of Best Fit
Have x’s in L1 and y’s in L2. LinReg(ax+b) L1,L2,Y1.

Details: Step 2. Perform the Regression in Chapter 4.
Note: The first time only, you must set up the calculator before doing LinReg(ax+b). Here’s how: [2nd 0 makes CATALOG] [x-1], scroll down to
DiagnosticOn, and press [ENTER] twice.
Details: Step 0. Setup in Chapter 4.

please donate at
Press [GRAPH].
Predict Average y for an x
After doing the regression, press [TRACE] [▼], enter the x number, and press [ENTER].
Details: Method 1: Trace on the Regression Line Graph in Chapter 4.
Plot the Residuals (optional)
Details: Optional: Display the Residuals in Chapter 4.
Mean and SD of a Discrete PD
Values in L1, probabilities in L2. 1-VarStats L1,L2. Verify that n is exactly 1 and Sx is blank.
Details: Mean and Standard Deviation of a DPD in Chapter 6.
Caution: For geometric and binomial models you can’t use 1-VarStats but must use the formulas.
Probabilities of Geometric Distribution
Probability that the first success comes on trial number x is geometpdf(p,x).
Probability that the first success comes within the first x trials is geometcdf(p,x).
Details: Computing Probabilities in Chapter 6.
Probabilities of Binomial Distribution
Use MATH200A Program part 3. If you don’t have the program, then:
Probability of exactly x successes in n trials is binompdf(n,p,x).
Probability of 0 to b successes in n trials is binomcdf(n,p,b).
Probability of a to b successes in n trials is binomcdf(n,p,b)−binomcdf(n,p,a−1).
Details: Computing Probabilities in Chapter 6.
Normal Distribution
Have Boundary(ies), Need Probability or Area

Have left boundary only? normalcdf(LeftBoundary,10^99,Mean,SD)

Have right boundary only? normalcdf(-10^99,RightBoundary,Mean,SD)
Have both boundaries? normalcdf(LeftBoundary,RightBoundary,Mean,SD)
For standard ND, either specify Mean=0 and SD=1 or just omit them.
Details: From Boundaries, Find Probability in Chapter 7.
Have Probability or Area, Need Boundary(ies)
Have area of left tail? invNorm(AreaToLeft,Mean,SD)

Have area of right tail? invNorm(1-AreaToRight,Mean,SD)
Have area of middle? Subtract from 1 to get area of two tails, divide by 2 to get area of one tail. Left boundary is
invNorm(AreaOfOneTail,Mean,SD) and right boundary is invNorm(1-AreaOfOneTail,Mean,SD)
For standard ND, either specify Mean=0 and SD=1 or just omit them.
Details: From Probability, Find Boundaries in Chapter 7.
Does a Data Set Fit the Normal Model?
Enter the points in a list and use MATH200A Program part 4. If you don’t have the program, see Normality Check on TI-83/84.
Details: Checking Data Sets in Chapter 7.
Sampling Distribution of the Mean
Probability of ge ing a sample mean between LeftBoundary and RightBoundary is normalcdf(LeftBoundary,RightBoundary,Mean,SD/√SampleSize).

Details: Example 1: Bank Deposits in Chapter 8.
Sampling Distribution of the Proportion
Compute standard error as √(p*(1-p)/n) [STO→] [x,T,θ,n]. Then probability of ge ing a sample proportion between LeftBoundary and
RightBoundary is normalcdf(LeftBoundary,RightBoundary,p,[x,T,θ,n]).
Details: Example 5: Swain v. Alabama in Chapter 8.
Confidence Intervals, Hypothesis Tests, Sample Size
See Inferential Statistics: Basic Cases [URL: h ps://BrownMath.com/swt/pfswt.htm.htm#cas_top].
TI-83/84 Troubleshooting
Updated 13 June 2014
Summary: Some common problems on the TI-83 or TI-84 calculator look intimidating because the messages are strange, but they’re easy to fix.
This page helps you with the TI-83, TI-83 Plus, TI-83 Plus Silver Edition, TI-84 Plus, and TI-84 Plus Silver Edition.
Contents: Error Messages

“Syntax”
“Window Range”
“Dim Mismatch”
“Invalid Dim”
“ERR:Version”
“ERR:DOMAIN”
List Troubles
I’ve lost a list from the STAT EDIT screen.
I can’t clear the numbers from a list.
I left out a number from my list.
Graphing Troubles
My screen is blank, except maybe for the axes and the grid.
My screen is covered with horizontal or vertical lines.
My screen is completely dark, every pixel filled in.
There are no tick marks on my graph.
An extra line or curve appears on my graph.
I can’t graph more than one equation at a time.
My normal distribution from ShadeNorm looks wrong.
Other Troubles
My screen is too light or too dark.
Where’s the correlation coefficient?
See also: Texas Instruments pages on the TI-83 and TI-84 family (accessed 2014-06-13).
Error Messages
General advice: Most messages give you a choice of Quit and Goto. If it’s available, always pick Goto: the TI-83/84 will show you the exact spot
where it found something wrong. That’s usually enough of a clue that you can figure out what’s wrong.
“Syntax”
Press [2] for Goto and recheck your equation on the Y= screen.
Make sure you pressed the [X,T,θ,n] key for x and not the [×] times key. Also, make sure you distinguished between the minus key [−] and the
change sign key [(-)]: the change sign key makes a shorter minus sign than the minus key.
“Window Range”
Press [1] for Quit and then press [WINDOW].
Make sure your Xmax is greater than Xmin and Ymax is greater than Ymin.
“Dim Mismatch”
Are you doing a sca erplot?

Press [Y=] and check that only one of Plot1, Plot2, Plot3 is highlighted.
Then press [2nd] [STAT PLOT] followed by the number of the active plot. ([STAT PLOT] is the shifted [Y=] key.) Check which list numbers are
mentioned.
Press [STAT] [ENTER] and make sure that you have equal numbers of entries in the two lists.
Are you doing regression analysis?
Note the two lists mentioned in your regression command, LinReg(ax+b) or similar. Press [STAT] [ENTER] and make sure that you have the same
number of entries in the two lists, at least two rows.
Are you doing anything else?

Press [1] for Quit and then press [Y=]. Make sure that there are no highlights on Plot1 Plot2 Plot3 at the top of the screen. If one is
highlighted, cursor to it and press [ENTER] to deactivate it.
“Invalid Dim”
This message has several possible causes.

If you get this when making some kind of plot, you probably forgot that you have one of Plot1 Plot2 Plot3 turned on, and it refers to
two lists that don’t have the same length. The cure is to press [Y=] and turn off the unwanted plot.
If you’re actually doing something that needs lists, such as a regression, but your lists don’t have the same length, you get this message.
Press [STAT] [1] and check your lists.
You also get this message when you give the TI-83 a list or matrix where it expected a variable or number, or vice versa. For instance, on the
STAT PLOT screen, you may mean to press [2nd] [L1] but if you miss the [2nd] you actually type a Y. In the same way you might type a Z
where you intend to type L2.
Karl Wein offers another possibility. You can create a list called L1, but that is not the same as the predefined list called L1 (note the small-
capital L), and of course similarly for lists 2 through 6. If you accidentally delete a predefined list from the editor, make sure to bring it back
with the SetUpEditor command.
In general, when you see this message you need to check carefully through what you’ve done to make sure that you used lists where you were
supposed to, and nowhere else.
“ERR:Version”
A TI-83 (not Plus or Silver) is trying to receive something that it can’t handle. If what you’re trying to transfer is a program, these features work in the
TI-83 Plus and all later TI-83s and TI-84s, but not in the original TI-83:
archiving operations
two-character labels that start with a le er
any lower-case text in strings
the Greek le ers ρ (rho) and σ (sigma) in strings, but the statistics variable σx can be used
“ERR:DOMAIN”
You’ve passed an incorrect argument to a function, such as cos-1(2) or pxl-on(160,160).
List Troubles
I’ve lost a list from the STAT EDIT screen.
It’s easy to hide one without intending to, just by pressing the [DEL] key while positioned on a column head.
To bring back L1 through L6 in that order, press [STAT] [5] [ENTER]. This runs the SetUpEditor command; you will not lose any numbers from
any lists.
I can’t clear the numbers from a list.
Use the arrow keys to move to the column heading, not the first number of the list. Then press [CLEAR] [ENTER]. All the numbers from the list will be
erased, and the cursor will move to the first row so that you can begin entering numbers.
I left out a number from my list.
No, you don’t have to re-enter the whole list. Use the arrow keys to move to the number just after where the missing number should go. Press [2nd]
[INS] (the shifted [DEL] key) and a space will open up in the list.
Graphing Troubles
My screen is blank, except maybe for the axes and the grid.
Are you plo ing specific points?

Press [ZOOM] [9], which is ZoomStat. That tells the TI-83/84 to adjust the window to show your
points or histogram with maximum detail. Because this textbook helps you,
please donate at
Are you plo ing one or more functions? BrownMath.com/donate.
If you don’t see your function graph anywhere, your window is probably set to a region of the xy
plane the graph just doesn’t happen to go through. Depending on the function, one of these techniques
will work:
ZoomFit is a good first try. Press the [ZOOM] bu on, then [0] (zero). (Thanks to Marilyn Webb for this suggestion.)
You can try to zoom out (like going higher to see more of the plane) by pressing [ZOOM] [3] [ENTER]. If you need to, press [ENTER] again to
zoom out further.
Finally, you can directly adjust the window to select a specific region. Set Xmin and Xmax so that they include the x domain you’re interested
in, and Ymin and Ymax to include the y range you want to see.
Are you plo ing a histogram?

Press [WINDOW]. Set the X’s in terms of your class limits, as follows:
Xmin = one class width less than the smallest class mark
Xmax = one class width more than the largest class mark
Xscl = the class width
Set the Y’s in terms of your frequencies, namely:
Ymin = 0
Ymax = the highest frequency (if it’s a relative frequency histogram, you can use 1)
Yscl = some convenient fraction of Ymax
Are you plo ing a distribution using ShadeNorm?

Please see that section of this page.
My screen is covered with horizontal or vertical lines.
This sometimes happens after zooming, or if you manually alter some window parameters. What happens is that
the tick marks are so closely spaced that they merge together visually.
Press [WINDOW] and adjust the Xscl or Yscl or both. Tick marks will appear every Xscl units left and right,
and you want that to be a reasonable fraction of the range between Xmin and Xmax. Tick marks will appear every
Yscl units up and down, and you want that to be a reasonable fraction of the range between Ymin and Ymax.
My screen is completely dark, every pixel filled in.
This means both your Xscl and Yscl values are too small. Fix this on the WINDOW screen, as explained above, under My screen is covered with
horizontal or vertical lines.
There are no tick marks on my graph.
Is your grid turned on?

Press [2nd] [FORMAT] (the shifted [ZOOM] key) and verify that GridOn is highlighted. If not, cursor to it and
press [ENTER].
Check your X and Y scales.

Press [WINDOW] and look at Xscl. It should be greater than 0 and less than the range from Xmin to Xmax. For
instance, if Xmin is −20 and Xmax is 20, the range is 40 and you might want Xscl to be 2, 5, or 10.
Also on the WINDOW screen, look at Yscl. It should be greater than 0 and less than the range from Ymin to
Ymax. For instance, if Ymin is −10 and Ymax is 10, the range is 20 and you might want Yscl to be 1 or 5.
An extra line or curve appears on my graph.
Press [Y=]. Look at Y1=, Y2=, and so on. Cursor to the equal sign for each unwanted function, and press [ENTER]. You’ll need to cursor down to
examine Y8=, Y9=, and Y0=, because the Y= screen shows only seven functions at a time.
Press [GRAPH] to redraw the graph.
I can’t graph more than one equation at a time.
Are there funny arrows at the left of your Y= screen?

The illustration and solution are courtesy of Jesse Phillips, who cites a page from the Texas Instruments
support site that has since been removed by TI for some reason:
In order to select additional graphs, the Transformation Graphing App will need to be uninstalled from
the TI-83 Plus Family and TI-84 Plus Family. Uninstalling the App does not erase it from the calculator, it
disables it from interfering with the normal graphing modes. Below are the steps to successfully uninstall
the App.
Press [APPS] key
Select Transfrm from the menu
Press 1:Uninstall
My normal distribution from ShadeNorm looks wrong.
Is too much shaded in for the limits you set?

You need to clear each drawing before making the next. Locate DRAW as the shifted [PRGM] key near the middle of the keyboard. Press [2nd]
[DRAW], then [1] to paste ClrDraw to the home screen, then [ENTER].
Repeat your ShadeNorm command.
Does the distribution not appear at all, or only in part?

You need to set your window parameters. Press [WINDOW] and then set them as follows. (The numbers in parentheses are for a standard normal
distribution, with mean=0 and standard deviation=1, where you specify only two parameters to ShadeNorm.)
Xmin = mean minus 4 standard deviations (For standard ND, use -4.)
Xmax = mean plus 4 standard deviations (For standard ND, use 4.)
Xscl = standard deviation (For standard ND, use 1.)
Ymin = 0
Ymax = 0.4 divided by standard deviation—remember you can enter the expression and let the TI-83/84 do the arithmetic for you (For
standard ND, use 0.4)
Yscl = 0.1 divided by standard deviation (For standard ND, use 0.1)
After se ing the window parameters, press [2nd] [QUIT] to return to the home screen, then [2nd] [ENTER] [ENTER] to re-execute the ShadeNorm
command.
Other Troubles
My screen is too light or too dark.
Many TI-83/84 owners don’t realize that the contrast is adjustable. Here’s how:
1. Press and release the gold [2nd] bu on. Verify that the blinking up arrow appears in the display.
2. Press and hold the blue up or down arrow key to increase or decrease contrast, until the display is to your liking. Be alert: the display will
change quickly.
The calculator will remember your contrast se ing; you don’t have to adjust it every time you turn the calculator on.
Where’s the correlation coefficient?
You’ve executed a regression from the STAT CALC menu. You get the slope and intercept all right, but where are r and r²?
For some reason, your TI-83/84 comes from the factory configured not to display correlation coefficients. You need to make a one-time mode se ing
so that these are displayed in future regressions:
1. Press [2nd] [CATALOG] (the shifted [0] key).
2. To move to the beginning of the D’s, press the [x−1] key. (A green D is printed above that key. Do not press the green [ALPHA] key first,
because the CATALOG command automatically puts the TI-83/84 in alpha mode.)
3. Use the arrow keys to move to DiagnosticOn.
4. Press [ENTER] to select the command, and [ENTER] again to execute it.
You don’t need to re-enter your regression command. Just press [2nd] [ENTER] and [2nd] [ENTER] again to recall it, then [ENTER] to execute it.
Sources Used
Updated 15 May 2016
Summary: Here are full citations for all the books, articles, and Web pages that I refer to in the textbook. (I’m following the format suggested by
chapters 15 and 16 of the Chicago Manual of Style, 13th Edition.)
Aldrich, John. 2012.

Figures from the History of Probability and Statistics. Retrieved 16 Oct 2014 from h p://www.economics.soton.ac.uk/staff/aldrich/Figures.htm
American Social History Project of CUNY and Center for History and New Media of George Mason University. n.d.
Landon in a Landslide: The Poll That Changed Polling. Retrieved 15 Sept 2014 from h p://historyma ers.gmu.edu/d/5168/
AVMA, American Veterinary Medical Association. 2014.

US Pet Ownership Statistics. Retrieved 26 Sept 2014 from h ps://www.avma.org/KB/Resources/Statistics/Pages/Market-research-statistics-US-
pet-ownership.aspx
Australian Bureau of Statistics. 2013.

Statistical Language—Types of Error. Retrieved 15 Sept 2014 from
h p://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-+types+of+error
Benson, Herbert, et al. 2005.

Study of the Therapeutic Effects of Intercessory Prayer (STEP) in Cardiac Bypass Patients — A Multi-Center Randomized Trial of Uncertainty and
Certainty of Receiving Intercessory Prayer. Retrieved 15 Sept 2014 from h p://www.templeton.org/pdfs/press_releases/060407STEP_paper.pdf
(PDF)
Benson, Herbert, et al. 2006.

Study of the Therapeutic Effects of Intercessory Prayer (STEP) in cardiac bypass patients: A multicenter randomized trial of uncertainty and certainty of
receiving intercessory prayer [abstract]. Retrieved 15 Sept 2014 from h p://www.ahjonline.com/article/S0002-8703%2805%2900649-6/abstract
Blood Types. 2009.

Retrieved 25 Sept 2014 from h p://bloodcenter.stanford.edu/education/blood_types.html
Bock, Dave. n.d.

Is That an Assumption or a Condition? Retrieved 27 Sept 2014 from
h p://apcentral.collegeboard.com/apc/members/courses/teachers_corner/31609.html
Bouchard, Thomas J., Jr., et al. 1990.

Sources of Human Psychological Differences: The Minnesota Study of Twins Reared Apart. Retrieved 28 Sept 2014 from
h p://web.missouri.edu/~segerti/1000H/Bouchard.pdf (PDF)
Bradford Hill, Sir Austin. 1965.

"The Environment and Disease: Association or Causation?" Proceedings of the Royal Society of Medicine 58 (1965): 295–300. Retrieved
15 Sept 2014 from h p://www.edwardtufte.com/tufte/hill
Brown, Stan. 2009.

How to Convert Units of Measurement. Retrieved 15 Sept 2014 from h p://oakroadsystems.com/math/convert.htm
Bulmer, M. G. 1979.
Principles of Statistics. Dover.
CDC, Centers for Disease Control and Prevention. 2014.

Colorectal Cancer Screening Guidelines. Retrieved 28 Sept 2014 from
h p://www.cdc.gov/cancer/colorectal/basic_info/screening/guidelines.htm:!mak
Census Bureau, United States. 2014a.

Educational A ainment: CPS Historical Time Series Tables. Retrieved 15 Sept 2014 from
h p://www.census.gov/hhes/socdemo/education/data/cps/historical/index.html
Census Bureau, United States. 2014b.

New York Quick Facts. Retrieved 28 Sept 2014 from h p://quickfacts.census.gov/qfd/states/36000.html
Classic Polling Surprises. 2004.

Retrieved 15 Sept 2014 from
h p://web.archive.org/web/20040819030531/h p://www.studyworksonline.com/cda/content/new_worksheet/0,,EXP545_NAV2-
76_SWK543,00.shtml
Clustering Illusion. 2014.

Retrieved 20 Sept 2014 from h ps://en.wikipedia.org/wiki/Clustering_illusion
Correlation and Dependence. 2014.
Retrieved 15 Sept 2014 from h p://en.wikipedia.org/wiki/Correlation
Dabes, Joe, and Carol Janik. 1999.

Statistics Manual. Tompkins Cortland Community College.
Dallal, Gerald E. 2002.

Is Statistics Hard? Retrieved 28 Sept 2014 from h p://www.jerrydallal.com/LHSP/hard.htm
DeVeaux, Richard D., Paul F. Velleman, and David E. Bock. 2009.

Intro Stats. 3d ed. Pearson Addison Wesley.
DiscovertheOdds .com. 2014.

What Are the Odds Of Being Struck By Lightning? Retrieved 20 Sept 2014 from h p://discovertheodds.com/what-are-the-odds-of-being-struck-
by-lightning/
Du on, Sarah, et al. 2013.

9 in 10 Back Universal Gun Background Checks. Retrieved 25 Sept 2014 from h p://www.cbsnews.com/news/9-in-10-back-universal-gun-
background-checks/
Empirical Relation between Mean, Median, and Mode. n.d.

Retrieved 14 Sept 2014 from h p://www.pinkmonkey.com/studyguides/subjects/stats/chap4/s0404601.asp
Experian plc. 2012.

Number of Older Vehicles on the Road in the United States Increased by More than 17 Million Since 2009. Retrieved 20 Sept 2014 from
h p://press.experian.com/United-States/Press-Release/number-of-older-vehicles-on-the-road-in-the-united-states-increased-by-more-than-
17.aspx
(Many thanks to TC3 librarian Barbara Kobri for finding this for me.)
Experimental Design in Statistics. 2014.

Retrieved 15 Sept 2014 from h p://sta rek.com/experiments/experimental-design.aspx
Fairleigh Dickinson University. 2011.

Some News Leaves People Knowing Less. Retrieved 21 Sept 2014 from h p://publicmind.fdu.edu/2011/knowless/ (PDF)
Finite and Infinite Population. 2014.

Retrieved 15 Sept 2014 from h p://www.emathzone.com/tutorials/basic-statistics/finite-and-infinite-population.html
Freedman, David, Robert Pisani, and Roger Purves. 2007.

Statistics. 4th ed. Norton.
Freshman Twenty. 2004.

Retrieved 28 Sept 2014 from h p://www.urbandictionary.com/define.php?term=freshman%20twentyamp;defid=716671
Gigerenzer, Gerd. 2002.

Calculated Risks. Simon & Schuster.
Gilovich, Thomas, Robert Vallone, and Amos Tversky. 1985.

“The Hot Hand in Basketball: On the Misperception of Random Sequences”. Cognitive Psychology 17(1985): 295–314. Retrieved 20 Sept 2014
from h p://psych.cornell.edu/sites/default/files/Gilo.Vallone.Tversky.pdf (PDF)
Goleman, Daniel. 1986.

Major Personality Study Finds That Traits Are Mostly Inherited. Retrieved 28 Sept 2014 from h p://www.nytimes.com/1986/12/02/science/major-
personality-study-finds-that-traits-are-mostly-inherited.html
Gosset, William Sealy (“Student”). 1908.

The Probable Error of a Mean. Retrieved 16 Oct 2014 from h p://www.york.ac.uk/depts/maths/histstat/student.pdf (PDF)
Illusory Superiority: “Driving Ability”. 2014.

Retrieved 15 Sept 2014 from h p://en.wikipedia.org/wiki/Lake_Wobegon_effect#Driving_ability
Introduction to Polling. n.d.

Retrieved 15 Sept 2014 from h p://web.archive.org/web/20051223051413/h p://www.pages.drexel.edu/~pa34/ptn1.htm#_ftn1
Ioannidis, John P. A. 2005.

Why Most Published Research Findings Are False. Retrieved 28 Sept 2014 from h p://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/
Johnson, Robert, and Patricia Kuby. 2003.

Just the Essentials of Elementary Statistics. 3d ed. Brooks/Cole.
Johnson, Robert, and Patricia Kuby. 2004.

Elementary Statistics. 9th ed. Thomson.
Keogh, Daniel [TheProfessorFunk]. 2011.
The Strange Powers of the Placebo Effect. Retrieved 15 Sept 2014 from h ps://www.youtube.com/watch?v=yfRVCaA5o18 (video)
Kuulasmaa, Kari, Hans-Werner Hense, and Hanna Tolonen. 1998.

Quality Assessment of Data on Blood Pressure in the WHO MONICA Project — Table 8: Mean and standard deviation of blood pressure.
Retrieved 27 Sept 2014 from h p://www.thl.fi/publications/monica/bp/bpqa.htm
Kuzma, Jan W., and Stephen E. Bohnenblust. 2005.

Basic Statistics for the Health Sciences. 5th ed. McGraw Hill.
Lane, David M. 2010.

Percentiles. Retrieved 14 Sept 2014 from h p://legacy.cnx.org/content/m10805/latest/
Lane, David M. 2013.

HyperStat Online — Chapter 9: Logic of Hypothesis Testing. Retrieved 28 Sept 2014 from
h p://davidmlane.com/hyperstat/logic_hypothesis.html
Mackowiak, Philip A., Steven S. Wasserman, and Myron M. Levine. 1992.

A Critical Appraisal of 98.6°F, the Upper Limit of the Normal Body Temperature, and Other Legacies of Carl Reinhold August Wunderlich [abstract].
Retrieved 15 Sept 2014 from h p://jama.jamanetwork.com/article.aspx?articleid=400116
The Math Forum @ Drexel. 2002.

Defining Quartiles. Retrieved 14 Sept 2014 from h p://mathforum.org/library/drmath/view/60969.html
The Math Forum @ Drexel. 2003.

Difference of Two Proportions in Statistics. Retrieved 27 Sept 2014 from h p://mathforum.org/library/drmath/view/68356.html
Mathematical Models. n.d.

Retrieved 20 Sept 2014 from h p://www.mathsisfun.com/algebra/mathematical-models.html
Mendelian Inheritance. 2014.

Retrieved 29 Sept 2014 from h p://en.wikipedia.org/wiki/Mendelian_inheritance
Michailides, George. n.d.a.

Jury Selection. Retrieved 28 Sept 2014 from h p://www.stat.ucla.edu/cases/jury/
Michailides, George. n.d.b.

Swain vs Alabama. Retrieved 27 Sept 2014 from h p://www.stat.ucla.edu/cases/swain/
Middleton, Michael R. n.d.

Be er Histograms Using Excel. Retrieved 15 Sept 2014 from h p://www.treeplan.com/download/Be er-Histograms-Using-Excel-2003.pdf
(PDF)
Misleading Graph: “Improper Scaling”. 2014.

Retrieved 15 Sept 2014 from h p://en.wikipedia.org/wiki/Misleading_graph#Improper_scaling
Monty Hall. 2014.

Retrieved 27 Sept 2014 from h p://mathforum.org/library/drmath/view/68356.html
Moore, David. 2003.

Re: P Value for Kids? Retrieved 28 Sept 2014 from h p://www.mail-archive.com/edstat@lists.ncsu.edu/msg04957.html
Moulton, Samuel T. n.d.

Probable Error of a Mean, The (“Student”) [Biometrika, 6, 1-25]. Retrieved 16 Oct 2014 from
h p://www.wjh.harvard.edu/~moulton/probable_error_mean.pdf (PDF)
NHTSA, National Highway Traffic Safety Administration, US Department of Transportation. 2008.

Average Fuel Economy Standards: Passenger Cars and Light Trucks: Model Years 2011–2015. Retrieved 14 Sept 2014 from
h p://www.nhtsa.gov/DOT/NHTSA/Rulemaking/Rules/Associated%20Files/CAFE_11-15_NPRM_April_21.pdf (PDF)
NICB, National Insurance Crime Bureau. 2013.

NICB’s Hot Wheels: Popular 10 Most Stolen Vehicles List Gets a Makeover. Retrieved 20 Sept 2014 from h ps://www.nicb.org/newsroom/news-
releases/hot-wheels-2012
O’Halloran, Sharyn. n.d.

Statistics and Quantitative Analysis U4320: Lecture 11 : Path Diagrams. Retrieved 20 Sept 2014 from h p://www.columbia.edu/itc/sipa/U4320y-
003/client_edit/Lecture11.ppt (PowerPoint)
Parker-Pope, Tara. 2007.

How Scared Should We Be? Retrieved 20 Sept 2014 from h p://well.blogs.nytimes.com/2007/10/31/how-scared-should-we-be/
(The table is somewhat confusing, giving lifetime risk of 1 in 60,453, but footnotes explain that is the 2003 figure for shark a acks, not fatal
a acks, divided by 77.6. Multiplying back 77.6×60,453 gives a 2003 probability of shark a ack of about 1 in 4,691,000.)
Paulos, John Allen. 2004.

A Mathematician Plays the Stock Market. Basic Books.
Pearson’s Product-Moment Correlation. 2001.
Retrieved 20 Sept 2014 from h p://www.mhhe.com/socscience/intro/cafe/common/stat/cstats03.mhtml
See “Learning Check #18” at the bo om.
Pew Research Center. 2013a.

Half of Parents with Young Children Read to Them Every Day. Retrieved 15 Sept 2014 from h p://www.pewresearch.org/daily-number/half-of-
parents-with-young-children-read-to-them-every-day/
Pew Research Center. 2013b.

Obama Job Approval Holds Steady, Economic Views Improve. Retrieved 25 Sept 2014 from h p://www.people-press.org/2013/06/19/obama-job-
approval-holds-steady-economic-views-improve/
Pew Research Center. 2013c.

US Image Rebounds in Mexico. Retrieved 21 Sept 2014 from h p://www.pewglobal.org/2013/04/29/u-s-image-rebounds-in-mexico/
Pew Research Center. 2013d.

A Third of Americans Say They Like Doing Their Income Taxes. Retrieved 25 Sept 2014 from h p://www.people-press.org/2013/04/11/a-third-of-
americans-say-they-like-doing-their-income-taxes/
Pew Research Center. 2013e.

Tea Party’s Image Turns More Negative. Retrieved 27 Sept 2014 from h p://www.people-press.org/2013/10/16/tea-partys-image-turns-more-
negative/
Physicians’ Health Study. 2009.

Retrieved 28 Sept 2014 from h p://phs.bwh.harvard.edu/phs1.htm
Ramsay, Clay, Steven Kull, Evan Lewis, and Stefan Subias. 2010.
Misinformation and the 2010 Election: A Study of the US Electorate. Retrieved 21 Sept 2014 from
h p://www.worldpublicopinion.org/pipa/pdf/dec10/Misinformation_Dec10_rpt.pdf (PDF)
Roets, Arne, Barry Schwar , and Yanjun Guan. 2012.

The Tyranny of Choice: A Cross-Cultural Investigation of Maximizing-Satisficing Effects on Well-Being. Retrieved 26 Sept 2014 from
h p://journal.sjdm.org/12/12815/jdm12815.html
Rosenbaum, Mike. n.d.a.

100 Meter Men’s Olympic Medalists. Retrieved 15 Sept 2014 from h p://trackandfield.about.com/od/sprintsandrelays/qt/olym100medals.htm
100 Meter Women’s Olympic Medalists. Retrieved 15 Sept 2014 from
h p://trackandfield.about.com/od/sprintsandrelays/qt/olym100women.htm
Ryan, Thomas A., Jr., and Brian L. Joiner. 1976.

Normal Probability Plots and Test for Normality. Retrieved 1 Oct 2014 from
h p://web.archive.org/web/20120916160230/h p://www.minitab.com/uploadedFiles/Shared_Resources/Documents/Articles/normal_probabili
(PDF)
Scarne, John. 1965.

Scarne on Cards. Rev., augmented ed. Signet (New American Library).
Schilling, Mark F., Ann E. Watkins, and William Watkins. 2002.

“Is Human Height Bimodal?” The American Statistician, August 2002 (56:3) Retrieved 27 Sept 2014 from
h p://www.biostat.jhsph.edu/bstcourse/bio751/papers/bimodalHeight.pdf
Schmi , Ma . 2013.
Honda Accord, Civic Remain Top Targets for Thieves. Retrieved 20 Sept 2014 from h p://blogs.cars.com/kickingtires/2013/08/honda-accord-civic-
remain-top-targets-for-thieves.html
Selection Bias. 2014.

Retrieved 15 Sept 2014 from h p://en.wikipedia.org/wiki/Selection_bias
Significant Figures/Digits. 2014.

Retrieved 15 Sept 2014 from h p://mathforum.org/library/drmath/sets/select/dm_sig_digits.html
Simon, Steve. 1999a.

Degrees of Freedom. Retrieved 14 Sept 2014 from h p://www.pmean.com/99/df.html
Simon, Steve. 1999b.

R-squared. Retrieved 14 Sept 2014 from h p://www.pmean.com/99/rsquared.html
Simon, Steve. 2000a.

Alternating Treatments. Retrieved 23 May 2015 from h p://www.pmean.com/00/alternate.html
Simon, Steve. 2000b.

Causation. Retrieved 15 Sept 2014 from h p://www.pmean.com/00/causation.html
Simon, Steve. 2000c.
Number Needed to Treat. Retrieved 28 Sept 2014 from h p://www.pmean.com/00/nnt.html
Simon, Steve. 2000d.

Outliers. Retrieved 14 Sept 2014 from h p://www.pmean.com/00/outliers.html
Simon, Steve. 2001.

Web Polls. Retrieved 23 May 2015 from h p://www.pmean.com/01/webpoll.html
Simon, Steve. 2004.

Degrees of Freedom, Part 2. Retrieved 14 Sept 2014 from h p://www.pmean.com/04/DegreesFreedom.html
Simon, Steve. 2010.

Confidence Interval with Zero Events. Retrieved 23 May 2015 from h p://www.pmean.com/01/zeroevents.html
Siu, Jason. 2013.

Honda Accord, Civic Most Stolen Vehicles in 2012. Retrieved 20 Sept 2014 from h p://www.autoguide.com/auto-news/2013/08/honda-accord-
civic-most-stolen-vehicles-in-2012-study.html
Social Security Administration, US. 2010.

Actuarial Life Table: Periodic Life Table, 2010. Retrieved 25 Sept 2014 from h p://www.ssa.gov/OACT/STATS/table4c6.html
Specter, Michael. 2009.

Denialism: How Irrational Thinking Hinders Scientific Progress, Harms the Planet, and Threatens Our Lives. Penguin Press.
Steering Commi ee of the Physicians’ Health Study Research Group. 1989.

Final Report on the Aspirin Component of the Ongoing Physicians’ Health Study. Retrieved 28 Sept 2014 from
h p://www.nejm.org/doi/full/10.1056/NEJM198907203210301#t=article
Sterne, Jonathan A. C., and George Davey Smith. 2001.

“Sifting the Evidence — What’s Wrong with Significance Tests?” British Medical Journal 322(7280): 226–231. Retrieved 28 Sept 2014 from
h p://www.ncbi.nlm.nih.gov/pmc/articles/PMC1119478/
Stu bach Enterprises. 2011.

The Prosecutor’s Fallacy. Retrieved 20 Sept 2014 from h p://pokersleuth.com/prosecutor-fallacy.shtml but missing when I checked links on
16 Apr 2016.
Sullivan, Michael. 2011.

Fundamentals of Statistics. 3d ed. Pearson Prentice Hall.
Tierney, John. 1991.

Behind Monty Hall’s Doors: Puzzle, Debate and Answer? Retrieved 27 Sept 2014 from h p://www.nytimes.com/1991/07/21/us/behind-monty-
hall-s-doors-puzzle-debate-and-answer.html
Triola, Mario. 2011.

Elementary Statistics Using the TI-83/84 Plus Calculator. 3d ed. Addison-Wesley.
Turner, Ronald, et al. 2005.

“An Evaluation of Echinacea angustifolia in Experimental Rhinovirus Infections”. New England Journal of Medicine 353(4): 341–348. Retrieved
30 Nov 2014 from h p://www.nejm.org/doi/pdf/10.1056/NEJMoa044441
(Suggested by Michael Specter’s Denialism.)
Upton, Graham, and Ian Cook. 2008.

Oxford Dictionary of Statistics. Oxford University Press.
Velleman, Paul. 2005.

Confounding Variables. Retrieved 15 Sept 2014 from h p://mathforum.org/kb/message.jspa?messageID=4088352&start=120
Vickers, Andrew. 2010.

What Is a p-Value Anyway? 34 Stories to Help You Actually Understand Statistics. Addison-Wesley.
(My thanks to Erik Westwig for drawing this book to my a ention.)
Virmani, Rajeev. 2012.

Confounding and Lurking Variables. Retrieved 15 Sept 2014 from h p://www.virmanimath.com/start-page-2012-2013/ap-stats-2012-
2013/chapter-2/apstatonlineclass/confounding-and-lurking-variables
von Hippel, Paul T. 2005.

Mean, Median, and Skew: Correcting a Textbook Rule. Retrieved 14 Sept 2014 from
h p://www.amstat.org/publications/jse/v13n2/vonhippel.html
Waner, Stefan, and Steven R. Costenoble. 1996.

Calculus Applied to Probability and Statistics. Retrieved 15 Sept 2014 from
h p://web.archive.org/web/20130618221901/h p://people.hofstra.edu/Stefan_Waner/RealWorld/cprob/cprob4.html
Welch, B. L. 1938.
The Significance of the Difference between Two Means When the Population Variances Are Unequal. Retrieved 28 Sept 2014 from
h p://www.stat.cmu.edu/~fienberg/Statistics36-756/Welch-Biometrika-1937.pdf (PDF)
What are my chances of being a victim of violent crime? 2010.

Retrieved 20 Sept 2014 from h p://www.crimeinamerica.net/2010/12/13/what-are-my-chances-of-being-a-victim-of-violent-crime/
(The figure for males is 18.4 per thousand, versus 15.8 per thousand for females. Ignoring that there are more females than males in the
population, the average figure is (18.4+15.8)/2 = 17.1 per thousand.)
Wheelan, Charles. 2013.

Naked Statistics: Stripping the Dread from the Data. Norton.
Young, Aaron, Humayun J. Chaudhry, Janelle Rhyne, and Michael Dugan. 2011.
A Census of Actively Licensed Physicians in the United States, 2010. Retrieved 28 Sept 2014 from
h p://www.nationalahec.org/pdfs/FSMBPhysicianCensus.pdf (PDF)

please donate at
Updates and new info: h ps://BrownMath.com/swt/

Statistics Without Tears by Stan Brown

Uploaded by

Copyright:

Available Formats

Statistics Without Tears by Stan Brown

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Without Tears by Stan Brown

Uploaded by

Copyright:

Available Formats

5/31/2021 Stats without Tears / SWT

Copyright © 2001–2020 by Stan Brown,

Help » About This Book

Help » About This Book

Feedback Contact information is at BrownMath.com/about/#Contact.

Contents: 1A. Statistics? What’s That?

1A. Statistics? What’s That?

1A1. What Should You Expect?

Statistics is diﬀerent from any other math course.

1A2. What Do You Get From the Course?

1A3. Sample and Population

1A4. Descriptive and Inferential Statistics

1A5. Statistic and Parameter

Deﬁnitions: A statistic is a numerical summary of a sample. A parameter is a numerical summary of a population.

Describing … The number is … And the process is …

Any sample A statistic Descriptive statistics

A population (usually) A parameter Inferential statistics

A census (pop. w/ Both statistic Descriptive statistics

1B. Good Samples, Bad Samples

1B1. The Gold Standard: Random Samples

Seeding the Random-Number Generator

You seed the random numbers only once. To do this:

1. Turn on the calculator and press [CLEAR].

3. Press [STO→], which shows on your screen as →.

Selecting Members of the Sample

1B2. Almost as Good: Systematic Samples

Taking a Systematic Sample

1B3. Good but Hard: Cluster Samples

1B4. Stratified Samples

A stratiﬁed sample is really a set of mini-samples grouped together.

Deﬁnition: A census is a sample that contains every member of the population.

1B6. Bogus Samples

Good Samples Bad Samples

1C. Data and Variables

1C1. What Are Data? What Are Variables?

1C2. Quantitative or Qualitative?

Quantitative (numeric) Qualitative (categorical or non-numeric)

1C3. Summary Statements

1D. Statistical Errors

1D1. Sampling Error

1D2. Nonsampling Errors

1E. Observation and Experiment

1E1. Observational Study Versus Designed Experiment

Confounding and Lurking Variables

Example 21: Does smoking cause lung cancer?

Example 22: Does aspirin make heart a acks less likely?

This was a designed experiment.

Example 23: Does prayer help surgical patients?

This was a designed experiment.

1E2. Experimental Techniques

Completely Randomized Design

Randomized Block Design

A special type of matched-pairs design matches each experimental unit to itself.

Control Group and Placebo

1F. Sharp Points

1F1. Rounding and Significant Digits

How Many Digits?

How to Round Numbers

When to Round Numbers

1F2. Powers of 10 from Your Calculator