Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
198 views38 pages

Chapter 3-Numerical Measures

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 38

Chapter 3

Describing Data: Numerical


Measures
1
Measures of Location
• The purpose of a measure of location is to
pinpoint the center of a distribution of data.
• In addition to measures of locations, we should
consider the dispersion – often called the
variation or spread – in the data.
• Five measures of location:
1. The arithmetic mean
2. The weighted mean
3. The median
4. The mode
5. The geometric mean
•   Population Mean:

• represents the population mean. It is the Greek


lower case letter “mu.”
• is the number of values in the population.
• represents any particular value.
• is the Greek capital letter “sigma” and indicates
the operation of adding.
• is the sum of the values in a population.

A parameter is a characteristic of a population.


•   Sample Mean:

• represents the sample mean. It is read “X bar.”


• is the number of values in the sample.
• represents any particular value.
• is the Greek capital letter “sigma” and indicates
the operation of adding.
• is the sum of the values in a population.

A statistic is a characteristic of a sample.


•  
Important properties of the arithmetic mean:

1. Every set of interval- or ratio- level has a mean.


2. All the values are included in computing the mean.
3. The mean is unique.
4. The sum of the deviations of each value from the mean is 0.

One disadvantage: if one or two values are either extremely


high or extremely low compared to the majority of the data,
then the mean might not be an appropriate average to
represent the data.
•   Weighted Mean:

Pronounced “X bar sub w”

Or
Median
The midpoint of the values after they have been
ordered from the smallest to the largest, or the
largest to the smallest.

The median is not affected by extremely large of


small values.

The median can be computed for ordinal-level


data or higher.
Mode
The value of the observation that appears most frequently.

The mode is especially useful in summarizing nominal and


ordinal level data.

The mode can be determined for all levels of data –


nominal, ordinal, interval, and ratio. The mode is not
affected by extremely high or low values. For many data
sets, though, there is no mode, causing it to be used less
often than the mean and median.

There can be one mode, multiple modes, or no mode for a


set of data.
Exercise 8:

The accounting department at a mail-order


company counted the following numbers of
incoming calls per day to the company’s toll-free
number during the first 7 days in May:

14, 24, 19, 31, 36, 26, 17.

a. Compute the arithmetic mean.


b. Indicate whether it is a statistic or a parameter.
Exercise 10:

The Human Relations Director at Ford began a study


of the overtime hours in the Inspection Department.
A sample of 15 showed they worked the following
number of overtime hours last month:

13, 13, 12, 15, 7, 15, 5, 12, 6, 7, 12, 10, 9, 13, 12

a. Compute the arithmetic mean.


b. Indicate whether it is a statistic or a parameter.
Exercise 16:

Andrews and Associates specializes in corporate


law. They charge $100 an hour for researching a
case, $75 an hour for consultations, and $200 an
hour for writing a brief. Last week one of the
associates spent 10 hours consulting with her
client, 10 hours researching the case, and 20
hours writing the brief. What was the weighted
mean hourly charge for her legal services?
Exercise 20:

The following are the ages of the 10 people in


the video arcade at the Southwyck Shopping
Mall at 10 A.M.

12, 8, 17, 6, 11, 14, 8, 17, 10, 8

Determine the median age. Determine the


mode.
The Relative Positions of the Mean, Median,
and Mode
A histogram is a graphical display of a frequency
distribution for quantitative data. That
distribution can take various shapes. Here we
will discuss characteristics for a symmetric
distribution, a positively skewed distribution,
and a negatively skewed distribution.
A Symmetric Distribution
4.5

3.5

2.5

1.5

0.5

0
0 up to 5 5 up to 10 10 up to 15 15 up to 20 20 up to 25 25 up to 30 30 up to 35
A Positively Skewed Distribution
9

0
0 up to 5 5 up to 10 10 up to 15 15 up to 20 20 up to 25 25 up to 30 30 up to 35
A Negatively Skewed Distribution
9

0
0 up to 5 5 up to 10 10 up to 15 15 up to 20 20 up to 25 25 up to 30 30 up to 35
•   Geometric Mean

The geometric mean is useful in finding the


average change of percentages, ratios, indexes,
or growth rates over time. The geometric mean
will always be less than or equal to the
arithmetic mean.
Exercise 28:

Compute the geometric mean of the following


percent increases: 2, 8, 6, 4, 10, 6, 8
•Rate
  of Increase Over Time:
Exercise 32:

JetBlue Airways is an American low-cost airline


headquartered in New York City. Its main base is
John F. Kennedy International Airport. JetBlue’s
revenue in 2002 was $635.2 million. By 2009,
revenue had increased to $3,290.0 million.
What was the geometric mean annual increase
for the period?
Measure of location

A measure of location only describes the center of data.


You would also want to know about the variation (or
dispersion) of the data as well in order to have a more
complete picture.

A small value for a measure of dispersion indicates that the


data are clustered closely. The mean is therefore considered
representative of the data. Conversely, a large measure of
dispersion indicates that the mean is not reliable.

Comparing the measures of dispersions of multiple


distributions is also helpful.
Range: The difference between the largest and
the smallest values in a data set.

Range = Largest Value – Smallest Value


•Mean
  Deviation: The arithmetic mean of the
absolute values of the deviations from the
arithmetic mean. It measures the mean amount
by which the values in a population, or sample
vary from their mean.

Mean Deviation:
•Variance:
  The arithmetic mean of the squared
deviations from the mean.

Population Variance:

Read as “sigma squared”

Standard Deviation: The square root of the


variance.

Population Standard Deviation:


•Sample
  Variance:

Sample Standard Deviation:

Although the use of is logical since is used to


estimate , it tends to underestimate the population
variance, . The use of in the denominator provides
the appropriate correction for this tendency. Because
the primary use of the sample statistic is to estimate
population parameters like , is preferred to in
defining the sample variance. This convention is also
used when computing the sample standard deviation.
The variance and standard deviation are also
based on the deviations from the mean.
However, instead of using the absolute value of
the deviations, the variance and the standard
deviation square the deviations.
Example:
Orange Ontario
County
20 20
The chart below shows the
40 49
number of cappuccinos sold at
50 50
Starbucks in the Orange 60 51
County airport and the 80 80
Ontario, California, airport
between 4 and 5 P.M. for a
sample of five days last
month. Determine the mean,
median, range, and mean
deviation for each location.
Comment on the similarities
and differences in these
measures.
Exercise 38:

A sample of eight companies in the aerospace


industry was surveyed as to their return on
investment last year. The results are (in
percent):

10.6, 12.6, 14.8, 18.2, 12.0, 14.8, 12.2, and 15.6

Calculate the range, arithmetic mean, mean


deviation, and interpret the values.
Exercise 46:

The annual incomes of the five vice presidents of


TMV industries are $125,000; $128,000; $122,000;
$133,000; and $140,000. Consider this population.
a. What is the range?
b. What is the arithmetic mean?
c. What is the population variance? The standard
deviation?
d. The annual incomes of officers of another firm
similar to TMV industries were also studied. The
mean was $129,000 and the standard deviation
$8,612. Compare the means and dispersions in
the two firms.
Exercise 50:

Compute the sample variance and the sample


standard deviation.

The sample of eight companies in the aerospace


industry was surveyed as to their return on
investment last year. The results are: 10.6, 12.6,
14.8, 18.2, 12.0, 14.8, 12.2, and 15.6
Exercise 54:

The mean income of a group of sample


observations is $500; the standard deviation is
$40. According to Chebyshev’s theorem, at least
what percent of incomes will lie between $400
and $600.
•Chebyshev’s
  Theorem: For any set of
observations (sample or population), the
proportion of the values that lie within k
standard deviations of the mean is at least ,
where k is any constant greater than 1.
Exercise 56:

The distribution of a sample of the number of


drinks sold per day at a nearby Wendy’s is
symmetric and bell-shaped. The mean number
of drinks sold per day is 91.9 with a standard
deviation of 4.67. Using the empirical rule, sales
will be between what two values on 68 percent
of the days? Sales will be between what two
values on 95 percent of the days?
Empirical Rule: For a symmetrical, bell-shaped
frequency distribution, approximately 68
percent of the observations will lie within plus
and minus one standard deviation of the mean;
about 95 percent of the observations will lie
within plus and minus two standard deviations
of the mean; and 99.7 percent will lie within
plus or minus three standard deviations of the
mean.
(Chart 3-7, page 86)
Exercise 58:

Determine the mean and standard deviation of


the following frequency distribution.
Class Frequency

0 up to 5 2

5 up to 10 7

10 up to 15 12

15 up to 20 6

20 up to 25 3
•Arithmetic
  Mean of Grouped Data:
•Standard
  Deviation, Grouped Data:

You might also like