3 Graphical Methods For Describing Data
3 Graphical Methods For Describing Data
3 Graphical Methods For Describing Data
How to construct
– Constructed like bar charts, but with two (or
more) groups being compared
– MUST use relative frequencies on the
vertical axis
– MUST include a key to denote the different
bars
Each year the Princeton Review conducts a
survey of students applying to college and of
parents of college applicants. In 2009, 12,715
high school students responded to the question
“Ideally how far from home would you like the
college you attend to be?” Also, 3007 parents
of students applying to college responded to
the question “how far from home would you like
the college your child attends to be?” Data is
displayed in the frequency table below.
Frequency
Create a
Ideal Distance Students Parents
comparative
Less than 250 miles 4450 1594
bar chart
250 to 500 miles 3942 902
with these
500 to 1000 miles 2416 331
data.
More than 1000 miles 1907 180
Relative Frequency
Ideal Distance Students Parents
Less than 250 miles .35 .53
250 to 500 miles .31 .30
500 to 1000 miles .19 .11
More than 1000 miles .15 .06
How to construct
– MUST first calculate relative frequencies
– Draw a bar representing 100% of the group
– Divide the bar into segments corresponding
to the relative frequencies of the categories
Remember the Princeton survey . . .
1.0
0.6
500 to 1000 miles
0.4 More than 1000 miles
0.2
Students Parents
Pie (Circle) Chart
When to Use Categorical data
How to construct
– Draw a circle to represent the entire data set
– Calculate the size of each “slice”:
Relative frequency × 360°
– Using a protractor, mark off each slice
To describe
– comment on which category had the largest
proportion or smallest proportion
Typos on a résumé do not make a very good
impression when applying for a job. Senior
executives were asked how many typos in a
résumé would make them not consider a job
candidate. The resulting data are summarized
in the table below.
Number of Typos Frequency Relative Frequency Create a pie
1 60 .40 chart for
2 54 .36 these data.
3 21 .14
4 or more 10 .07
Don’t know 5 .03
Number of Typos Frequency Relative Frequency
1 60 .40
2 54 .36
3 21 .14
4 or more 10 .07
Don’t know 5 .03
How to describe
a numerical,
univariate graph
What strikes you as the most distinctive
difference among the distributions of
exam scores in classes A, B, & C ?
1. Center
• discuss where the middle of the
data falls
Unimodal
What strikes you as the most distinctive
difference among the distributions of
exam scores in class J ?
4. Unusual occurrences
• Outlier - value that lies away from
the rest of the data
• Gaps
• Clusters
5. In context
• You must write your answer in
reference to the context in the
problem, using correct statistical
vocabulary and using complete
sentences!
Graphs for numerical
data
Stem-and-Leaf Displays
When to Use Univariate numerical data
How to construct
– Select one or more of the leading digits for the
stem
– List the possible stem values in a vertical column
– Record the leaf for each observation beside each
corresponding stem value
– Indicate the units for stems and leaves in a key or
legend
To describe
– comment on the center, spread, and shape of the
distribution and if there are any unusual features
The following data are price per ounce for
various brands of different brands of dandruff
shampoo at a local grocery store.
0.32 0.21 0.29 0.54 0.17 0.28 0.36 0.23
Northern Africa
54.6 34.3 48.9 77.8 59.6 88.5 97.4 92.5 83.9
98.8 91.6 97.8 96.1 92.2 94.9 98.6 86.6 96.9
88.9
Central Africa
58.3 34.6 35.5 45.4 38.6 63.8 53.9 61.9 69.9
43.0 85.0 63.4 58.4 61.9 40.9 73.9 34.8 74.4
97.4 61.0 66.7 79.6
Histograms
When to Use Univariate numerical data
12 2 4 6 6 7 8 7 8 11
8 3 5 6 7 10 1 9 7 6
9 7 5 4 7 4 6 7 8 10
above each
vertical
axis, scaled
6
5
value
axis, with a
scaled
with the
4
height
with
possible
3
corresponding
frequency
values of
to the
2
or variable
the relative
frequency.
1
0
0 1 2 3 4 5 6 7 8 9 10 11 12 offrequency.
interest.
Histograms
When to Use Univariate numerical data
How to construct
- construct similar to histograms with continuous
data, but with density on the vertical axis
0.8
0.6
Approximately 0.55
0.4
0.2
2 4 6 8 10 12 14
Rainfall
1.0
0.8
0.6
0.4
0.2
Rainfall
1.0
0.8
0.6
0.4
The interval 10 to 11
inches, because its slope
0.2 is steeper, indicating a
larger proportion
occurred.
2 4 6 8 10 12 14
Rainfall
Displaying Bivariate
Numerical Data
Scatterplots
When to Use Bivariate numerical data
How to construct
- Draw a horizontal scale and mark it with
appropriate values of the independent variable
- Draw a vertical scale and mark it appropriate
values of the dependent variable
- Plot each point corresponding to the observations
To describe
- comment the relationship between the variables
Time Series Plots
When to Use
- measurements collected over time at
regular intervals
How to construct
- Draw a horizontal scale and mark it with
appropriate values of time
- Draw a vertical scale and mark it appropriate
values of the observed variable
- Plot each point corresponding to the
observations and connect
To describe
- comment on any trends or patterns over time
The accompanying time-series plot of movie box
office totals (in millions of dollars) over 18
weeks in the summer for 2001 and 2002
appeared in USA Today (September 3, 2002).
Describe any
trends or
patterns
that you see.