C Q, D Q, CN C: Name: - Period: - Date
C Q, D Q, CN C: Name: - Period: - Date
C Q, D Q, CN C: Name: - Period: - Date
2. The following partially complete two-way table shows the favorite food of a random sample of 20 Dallas
Cowboy football fans. Example calculation: Male eggplant: 4/15= 27%
Eggplant Quinoa Kale Total
Male 4 (27%) 8 (53%) 3 (20%) 15
Female 10 (67%) 1 (7%) 4 (27%) 15
Total 14 9 7 30
The following partially complete two-way table shows the favorite food
of a random sample of 20 Dallas Stars hockey fans.
Eggplant Quinoa Kale Total
Male 5 (36%) 4 (29%) 5 (36%) 14
Female 6 (38%) 5 (31%) 5 (31%) 16
Total 11 9 10 30
There is only a weak association between gender and favorite food among the hockey fans.
For hockey fans, the conditional distributions (and segment heights) for favorite food are
about the same for each gender and about the same within each gender.
1
3. The following table is a set of starting salaries (in $1000s) of a sample of 80 recent college graduates.
a) Complete the column cumulative frequency. Then draw a histogram.
Class Frequency Cumulative
interval frequency
30 < x ≤ 40 12 12
40 < x ≤ 50 32 32+12 =44
50 < x ≤ 60 16 60
60 < x ≤ 70 8 68
70 < x ≤ 80 4 72
80 < x ≤ 90 4 76
90 < x ≤ 100 4 80
b) Is the mean of this data greater or less than the median? Explain.
Mean > median. Because the distribution is skewed right, the average will be “pulled”
towards the larger values; the mean is not resistant. The median is resistant.
c) Is the median starting salary more than $50,000? Justify your answer.
No. The median starting salary is the average of the 40th and 41st values in the ordered list of
values. From the cumulative frequency table, there are 44 values at or below 50. Therefore,
the median must be $50K or less.
mean
d) One possible statistic to measures skewness is the ratio of the mean to the median, i.e., .
median
What value(s) of this statistic might indicate that the distribution of values is symmetric? Explain.
If the mean ≈ median, then the statistic is close to 1 and the distribution is approx.
symmetric.
e) What values of that statistic might indicate that the distribution is skewed to the right? Explain.
When the data is skewed to the right, the median < mean. So, assuming all the data values
are greater than zero, the statistic will be greater than 1 when the data is skewed right.
4. A consumer group surveyed the prices for a certain item in five different stores and reported the average
price as $15. We visited four of the five stores and found the prices to be $10, $15, $15, and $25.
Assuming that the consumer group is correct, what is the price of the item at the store that we did not
visit?
(a) $10 (b) $15 (c) $20 (d)$25
Mean = (10 + 15 + 15 + 25 + X)/5 = 15. Solving, X = $10.
5. The mean of five numbers is 36. If one of those numbers is 28, then what is the mean of the other four
numbers?
Mean = (a + b + c + d + 28)/5 = 36. Algebraic manipulation yields
a + b + c + d = 180 – 28 = 152 à (a + b + c + d)/4 = 38
6. A distribution of 6 scores has a median of 21. If the highest score increases by 3 points, what will be the
value of the median?
(A) 21 (B) 21.5 (C) 24 (D) 27 (E) cannot be determined with given information
(A) is correct
7. A distribution of 6 scores has a mean of 20. If the highest score increases by 3 points, what will be the
value of the mean?
(A) 20 (B) 20.5 (C) 23 (D) 27 (E) cannot be determined with given information
(B) is correct (Original data: mean = 120/6 = 20; new data: mean = 123/6 = 20.5)
2
8. The circumference of 14 trees in the California Redwood Forest selected at random were measured. The
results to the nearest inch are given below.
80 62 38 81 77 55 77 55 94 10 70 85 40 53
Rearranging to put the numbers in order,
10 38 40 53 55 55 62 70 77 77 80 81 85 94
a) Calculate the following measures of center for this data (and show your work):
Mean Median (calculate by hand)
x = Σ xi/n = 62.6 inches (62+70)/2 = 66 inches
b) Calculate the following measures of spread for this data (and show your work):
Standard deviation Interquartile Range
Sample variance = s2 = (Σ(xi - x )2) /(n-1) = 514.5 inches2 IQR = Q3 – Q1 = 80 – 53 = 27 inches
Sample standard deviation = s = √514.5 = 22.7 inches
c) Interpret the standard deviation in context.
The circumferences of the trees typically vary by 22.7 inches from the mean of 62.6 inches.
d) What is the five-number summary for this data? Identify each number.
Min = 10 in. Q1 = 53 in. Median = 66 in. Q3 = 80 in. Max = 94 inches
e) Determine whether there are any outliers in this data. Show your work. IQR = 80 – 53 = 27
Q1 – 1.5·IQR = 53 – 1.5·(27) = 12.5 Q3 + 1.5·IQR = 80 + 1.5·(27) = 120.5
10 < 12.5. ⸫ 10 is a low outlier. 120.5 > 94. ⸫ No high outlier.
f) What do the statistics that you have calculated suggest about the skewness of the distribution? Explain.
Mean (62.6) < Median (66) suggests that the distribution is skewed left.
g) Which summary statistics—the median, mean, standard deviations, interquartile range—should you use
to report the overall circumference of these trees? Explain.
Because the data is skewed to the left and/or there is an outlier, we should use the median
and the interquartile range (“IQR”) as the measures of center and spread, respectively. The
median and IQR are resistant to the influence of outliers; the mean and the standard
deviation are not.
c) Based on the stemplot, give one reason you might give to choose Tarzan to be on your hockey team?
Tarzan is likely to score more goals than Jane. Tarzan has a higher first quartile, median,
mean, third quartile and maximum.
d) In any given year, which hockey player can you count on to consistently score goals close to his or her
season average? Explain.
Jane. Jane’s results vary less about her average; she has a lower standard deviation and IQR.
10. A distribution of 6 scores has a mean of 20 and a standard deviation of 3. Suppose we added a score of 15
to the distribution. What effect would including this observation have on the mean and standard
deviation? Explain.
The mean would decrease because 15 is less than the current mean.
The standard deviation would increase because 15 is more than the current “typical”
distance (3) from the mean to the observations in distribution.
4
Warmup Questions
1. Given the set of data { 8 6 5 5 4 2 }, demonstrate that adding a constant (e.g., 5) to every score
increases the mean and median by that amount.
Original data rearranged {2 4 5 5 6 8} New data after adding 5 to each score: {7 9 10 10 11 13}
Mean = ∑x /n = 30/6 = 5 Mean = ∑x /n = 60/6 = 10
Median = 5 Median = 10
2. Same facts as stated in the previous problem. What will happen to the first quartile and third quartile?
(a) Be unchanged (c) increase by 5 (correct)
(b) Be multiplied by 5 (d) increase by √5
3. Same facts as stated in the previous problem. What will happen to the interquartile range?
(a) Be unchanged (correct) (c) increase by 5
(b) Be multiplied by 5 (d) increase by √5
4. Given the same set of data { 8 6 5 5 4 2 }, show that multiplying each score by a constant multiplies the
mean and median by that constant.
Original data rearranged {2 4 5 5 6 8} New data after multiplying each score by 5: {10 20 25 25 30 40}
Mean = ∑x /n = 30/6 = 5 Mean = ∑x /n = 150/6 = 25
Median = 5 Median = 25
5. The figure on the right is the density curve of a distribution. Which of the
following statements is true?
(A) The mean will be higher than the median because the distribution is skewed left.
(B) The median will be higher than the mean because the distribution is skewed left.
(C) The mean will be higher than the median because the distribution is skewed right.
(D) The median will be higher than the mean because the distribution is skewed right.
(E) The mean and median will be approximately the same because the distribution is symmetrical.
6. The scores on a statistics exam are strongly skewed to the left and have a very wide range. Which of the
following would be the best method for describing the distribution? Explain your answer choice.
(a) Median and interquartile range
(b) Mean and standard deviation
5
Notes on old AP exam FRQS
2000 3 I Graphing and comparing frequency distributions Flexibility
Display the data graphically so that blank and blank can be compared.
Based on an examination of your graphical display, write a few sentences comparing the blank with the blank.
Use the same scale to draw boxplots of for the blank and blank.
You have been asked to report on this for a school newspaper. Write a few sentences describing the student and faculty
performance in this competition for the newspaper.
On the grid below, draw parallel boxplots (showing outliers if any) of the differences of the two additives.
Which additive, A or B, would you recommend if the goal is to increase gas mileage in the highest proportion of cars?
Explain your choice.
…. If your goal is to have the highest mean increase in gas? Explain your choice.
Which summary statistics, the mean or the median, should the instructor use to report that overall exam performance
was high? Explain.
The midrange is defined as …. Compute this value using the data on the previous page.
If a parent wants to maximize the probability of having a ping-pong ball land within a band, which of the two catapaults
would be better to use than the other. Justify your choice.
Use your stemplot to describe the main features of this score distribution.
Why would it be misleading to report only a measure of center for this score distribution?
6
2010 Form B 1 I Comparing boxplots, constructing a stemplot, stemplot vs. boxplot Polluted ri
(Parallel box plots show) Compare the distributions of the concentration of aldrin among the three rivers.
(Give data) Construct a stemplot that displays the concentrations of aldrin for River X.
Describe a characteristic of the distribution of aldrin concentrations in River X that can be seen in the stemplot but cannot
be seen in the boxplot.
2011 Form B 1 I Estimating median from hist., comparing hist., mean vs. median Pupil-teac
Describe how you would use the histograms to estimate the median P-T ratio for each group (west and east) of states.
Then use this procedure to estimate the median of the west group and the median of the east group.
(b) Write a few sentences comparing the distributions of P-T ratios for states in the two groups (west and east) during the
2001–2002 school year.
(c) Using your answers in parts (a) and (b), explain how you think the mean P-T ratio during the 2001–2002 school year
will compare for the two groups (west and east).
(b) Suppose both corporations offered you a job for $36,000 a year as an entry-level accountant.
(i) Based on the boxplots, give one reason why you might choose to accept the job at corporation A.
(ii) Based on the boxplots, give one reason why you might choose to accept the job at corporation B.
2016 1 I Describing a distribution; effect of changing a value on mean, median. Robin's tip
Histogram shown.
Write a few sentences to describe the distribution of tip amounts for the day shown.
One of the tip amounts was $8. If the $8 tip had been $18, what effect would the increase have had on the following
statistics? Justify your answer.
The mean:
The median:
Consider a piece of pottery known to have originated at one of the three sites, but the actual site is not known.
Suppose an analysis of the clay reveals that the sum of the percents of the three chemicals X, Y, and Z is 20.5%. Based on
the boxplots, which site, --I, II or III—is the most likely site where the piece of pottery originated? Justify your choice.
Suppose only one chemical could be analyzed in the piece of pottery. Which chemical—X, Y or Z—would be the most
useful in identifying the site where the piece of pottery originated? Justify your choice.
7
Medians from histograms; mean from combined sample; mean +/- SD on
2018 5 I histogram Teachers
Teaching year is recorded as an integer, with first-year teachers recorded as 1, second-year teachers recorded as 2, and so
on. Both sets of data have a mean teaching year of 8.2, with data recorded from 200 teachers at High School A and 22
teachers at High School B. On the histograms, each interval represents possible integer values from the left endpoint up
to but not including the right endpoint.
(a) The median teaching year for one high school is 6, and the median teaching year for the other high school is 7.
Identify which high school has each median and justify your answer.
(b) An additional 18 teachers were not included with the data recorded from the 200 teachers at High School A. The
mean teaching year of the 18 teachers is 2.5. What is the mean teaching year for all 218 teachers at High School
A?
(c) The standard deviation of the teaching for the 221 teachers at High School B is 7.2. If one teacher is selected at
random from High School B, what is the probability that the teaching year for the selected teacher will be within 1
standard deviation of the mean of 8.2? Justify your answer.