Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

3 Graphical Methods For Describing Data

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 46

Chapter 3

Graphical Methods for


Describing Data
Graphs for categorical
data
Double Bar Charts
When to Use Categorical data

How to construct
– Constructed like bar charts, but with two (or
more) groups being compared
– MUST use relative frequencies on the
vertical axis
– MUST include a key to denote the different
bars
Each year the Princeton Review conducts a
survey of students applying to college and of
parents of college applicants. In 2009, 12,715
high school students responded to the question
“Ideally how far from home would you like the
college you attend to be?” Also, 3007 parents
of students applying to college responded to
the question “how far from home would you like
the college your child attends to be?” Data is
displayed in the frequency table below.
Frequency
Create a
Ideal Distance Students Parents
comparative
Less than 250 miles 4450 1594
bar chart
250 to 500 miles 3942 902
with these
500 to 1000 miles 2416 331
data.
More than 1000 miles 1907 180
Relative Frequency
Ideal Distance Students Parents
Less than 250 miles .35 .53
250 to 500 miles .31 .30
500 to 1000 miles .19 .11
More than 1000 miles .15 .06

What does this


graph show about
the ideal distance
college should be
from home?
Segmented (or Stacked) Bar
Charts
When to Use Categorical data

How to construct
– MUST first calculate relative frequencies
– Draw a bar representing 100% of the group
– Divide the bar into segments corresponding
to the relative frequencies of the categories
Remember the Princeton survey . . .

Create a segmented bar graph with these


data.
Relative Frequency
Ideal Distance Students Parents
Less than 250 miles .35 .53
250 to 500 miles .31 .30
500 to 1000 miles .19 .11
More than 1000 miles .15 .06
Relative Frequency
Ideal Distance Students Parents
Less than 250 miles .35 .53
250 to 500 miles .31 .30
500 to 1000 miles .19 .11
More than 1000 miles .15 .06

1.0

0.8 Less than 250 miles


250 to 500 miles
Relative frequency

0.6
500 to 1000 miles
0.4 More than 1000 miles

0.2

Students Parents
Pie (Circle) Chart
When to Use Categorical data

How to construct
– Draw a circle to represent the entire data set
– Calculate the size of each “slice”:
Relative frequency × 360°
– Using a protractor, mark off each slice

To describe
– comment on which category had the largest
proportion or smallest proportion
Typos on a résumé do not make a very good
impression when applying for a job. Senior
executives were asked how many typos in a
résumé would make them not consider a job
candidate. The resulting data are summarized
in the table below.
Number of Typos Frequency Relative Frequency Create a pie
1 60 .40 chart for
2 54 .36 these data.
3 21 .14
4 or more 10 .07
Don’t know 5 .03
Number of Typos Frequency Relative Frequency
1 60 .40
2 54 .36
3 21 .14
4 or more 10 .07
Don’t know 5 .03
How to describe
a numerical,
univariate graph
What strikes you as the most distinctive
difference among the distributions of
exam scores in classes A, B, & C ?
1. Center
• discuss where the middle of the
data falls

• three measures of central tendency


– mean, median, & mode
What strikes you as the most distinctive
difference among the distributions of
scores in classes D, E, & F?
2. Spread
• discuss how spread out the data is

• refers to the variability in the data

• Measure of spread are


– Range, standard deviation, IQR
What strikes you as the most distinctive
difference among the distributions of
exam scores in classes G, H, & I ?
3. Shape
• refers to the overall shape of the
distribution

• symmetrical, uniform, skewed, or


bimodal
Symmetrical
• refers to data in which both sides
are (more or less) the same when
the graph is folded vertically down
the middle
• bell-shaped is a special type
– has a center mound with two
sloping tails
Uniform
• refers to data in which every class
has equal or approximately equal
frequency
Skewed

• refers to data in which one side


(tail) is longer than the other side

• the direction of skewness is on the


side of the longer tail
Bimodal (multi-modal)
• refers to the number of peaks in
the shape of the distribution
• Bimodal would have two peaks
• Multi-modal would have more than
two peaks

Unimodal
What strikes you as the most distinctive
difference among the distributions of
exam scores in class J ?
4. Unusual occurrences
• Outlier - value that lies away from
the rest of the data

• Gaps

• Clusters
5. In context
• You must write your answer in
reference to the context in the
problem, using correct statistical
vocabulary and using complete
sentences!
Graphs for numerical
data
Stem-and-Leaf Displays
When to Use Univariate numerical data

How to construct
– Select one or more of the leading digits for the
stem
– List the possible stem values in a vertical column
– Record the leaf for each observation beside each
corresponding stem value
– Indicate the units for stems and leaves in a key or
legend

To describe
– comment on the center, spread, and shape of the
distribution and if there are any unusual features
The following data are price per ounce for
various brands of different brands of dandruff
shampoo at a local grocery store.
0.32 0.21 0.29 0.54 0.17 0.28 0.36 0.23

Create a stem-and-leaf display with this data?


Stem Leaf
1 7
2 1 9 8 3
3 2 6
4
5 4
The Census Bureau projects the median age in 2030 for
the 50 states and Washington D.C. A stem-and-leaf
display is shown below.
The following is data on the percentage of
primary-school-aged children who are enrolled in
school for 19 countries in Northern Africa and
for 23 countries in Central African.

Northern Africa
54.6 34.3 48.9 77.8 59.6 88.5 97.4 92.5 83.9
98.8 91.6 97.8 96.1 92.2 94.9 98.6 86.6 96.9
88.9

Central Africa
58.3 34.6 35.5 45.4 38.6 63.8 53.9 61.9 69.9
43.0 85.0 63.4 58.4 61.9 40.9 73.9 34.8 74.4
97.4 61.0 66.7 79.6
Histograms
When to Use Univariate numerical data

How to construct Discrete data


―Draw a horizontal scale and mark it with the possible
values for the variable
―Draw a vertical scale and mark it with frequency or
relative frequency
―Above each possible value, draw a rectangle centered
at that value with a height corresponding to its
frequency or relative frequency
To describe
– comment on the center, spread, and shape of the
distribution and if there are any unusual features
Queen honey bees mate shortly after they
become adults. During a mating flight, the queen
usually takes several partners, collecting sperm
that she will store and use throughout the rest of
her life. A study on honey bees provided the
following data on the number of partners for 30
queen bees.

12 2 4 6 6 7 8 7 8 11
8 3 5 6 7 10 1 9 7 6
9 7 5 4 7 4 6 7 8 10

Create a histogram for the number of partners of


the queen bees.
Draw a
First draw a
rectangle
Next draw a
horizontal
7

above each
vertical
axis, scaled
6

5
value
axis, with a
scaled
with the
4
height
with
possible
3
corresponding
frequency
values of
to the
2

or variable
the relative
frequency.
1

0
0 1 2 3 4 5 6 7 8 9 10 11 12 offrequency.
interest.
Histograms
When to Use Univariate numerical data

How to construct Continuous data


―Mark the boundaries of the class intervals on the
horizontal axis
―Draw a vertical scale and mark it with frequency or
relative frequency
―Draw a rectangle directly above each class interval
with a height corresponding to its frequency or
relative frequency
To describe
– comment on the center, spread, and shape of the
distribution and if there are any unusual features
A study examined the length of hours spent
watching TV per day for a sample of children
age 1 and for a sample of children age 3. Below
are comparative histograms.

Children Age 1 Children Age 3


Histograms with unequal intervals
When to use
- when you have a concentration of data in the
middle with some extreme values

How to construct
- construct similar to histograms with continuous
data, but with density on the vertical axis

relative frequency for interval


density 
width of interval
Cumulative Relative Frequency Plot
When to use
- used to answer questions about percentiles.
How to construct
- Mark the boundaries of the intervals on the
horizontal axis
- Draw a vertical scale and mark it with relative
frequency
- Plot the point corresponding to the upper end of
each interval with its cumulative relative
frequency, including the beginning point
- Connect the points.
The National Climatic Center has been collecting
weather data for many years. The annual rainfall
amounts for Albuquerque, New Mexico from 1950 to
2008 were used to create the frequency distribution
below.
Annual Rainfall Relative Cumulative relative
(in inches) frequency frequency
4 to <5 0.052 0.052
+
5 to <6 0.103 0.155
+
6 to <7 0.086 0.241
7 to <8 0.103
8 to <9 0.172
9 to <10 0.069
10 to < 11 0.207
11 to <12 0.103
12 to <13 0.052
13 to <14 0.052
The National Climatic Center has been collecting
weather data for many years. The annual rainfall
amounts for Albuquerque, New Mexico from 1950 to
2008 were used to create the frequency distribution
below.
Annual Rainfall Relative Cumulative relative
(in inches) frequency frequency
4 to <5 0.052 0.052
5 to <6 0.103 0.155
6 to <7 0.086 0.241
7 to <8 0.103 0.344
8 to <9 0.172 0.516
9 to <10 0.069 0.585
10 to < 11 0.207 0.792
11 to <12 0.103 0.895
12 to <13 0.052 0.947
13 to <14 0.052 0.999
1.0

Cumulative relative frequency

0.8

0.6
Approximately 0.55

0.4

0.2

2 4 6 8 10 12 14

Rainfall
1.0

Cumulative relative frequency

0.8

0.6

0.4

0.2

Approximately 7.5 inches


2 4 6 8 10 12 14

Rainfall
1.0

Cumulative relative frequency

0.8

0.6

0.4
The interval 10 to 11
inches, because its slope
0.2 is steeper, indicating a
larger proportion
occurred.
2 4 6 8 10 12 14

Rainfall
Displaying Bivariate
Numerical Data
Scatterplots
When to Use Bivariate numerical data

How to construct
- Draw a horizontal scale and mark it with
appropriate values of the independent variable
- Draw a vertical scale and mark it appropriate
values of the dependent variable
- Plot each point corresponding to the observations
To describe
- comment the relationship between the variables
Time Series Plots
When to Use
- measurements collected over time at
regular intervals
How to construct
- Draw a horizontal scale and mark it with
appropriate values of time
- Draw a vertical scale and mark it appropriate
values of the observed variable
- Plot each point corresponding to the
observations and connect
To describe
- comment on any trends or patterns over time
The accompanying time-series plot of movie box
office totals (in millions of dollars) over 18
weeks in the summer for 2001 and 2002
appeared in USA Today (September 3, 2002).

Describe any
trends or
patterns
that you see.

You might also like