Chapter 3 Data Description
Chapter 3 Data Description
STATISTICS
Contents
• Measures of Central Tendency
• Measures of Variation
• Measures of Position
• Exploratory Data Analysis
Objectives:
• Summarize data, using measures of central
tendency, such as the mean, median, mode, and
midrange.
• Describe data, using measures of variation, such as
the range, variance, and standard deviation.
• Identify the position of a data value in a data set,
using various measures of position, such as
percentiles, deciles, and quartiles.
• Use the techniques of exploratory data analysis,
including boxplots and five-number summaries, to
discover various aspects of data.
Introduction
• Statistic methods can be used to summarize
data.
• Measures of average are also called measures
of central tendency and include the mean.
Media, mod, and midrange.
• Measures that determine the spread of data
values are called measures of variation or
measures of dispersion and include the range,
variance, and standard deviation.
Introduction
• Measure of position tell where a specific data
value falls within the data set or its relative
position in comparison with other data values.
• The most common measures of position are
percentiles, deciles, and quartiles.
• The measures of central tendency, variation,
and position are part of what is called
traditional statistics. This type of data is
typically used to confirm conjectures about the
data
Introduction
• Another type of statistics is called exploratory
data analysis. These techniques include the
box plot and the five-number summary. They
can be used to explore data to see what they
show
Topic 1: Measures of Central Tendency
Some basic definition
• A statistic is a characteristic or measure obtained
by using the data values from a sample.
• A parameter is a characteristic or measure
obtained by using all the data values from a
specific population.
• When the data in a data set is ordered it is called
a data array.
Measures of Central Tendency
General Rounding Rule
• In statistics the basic rounding rule is that when
computations are done in the calculation, rounding
should not be done until the final answer is
calculated
Topic 1: Measures of Central Tendency
The Mean
• The mean, also known as the arithmetic average,
is found by adding the values of the data and
dividing by the total number of values.
• Rounding Rule for the Mean The mean should
be rounded to one more decimal place than
occurs in the raw data.
For example,
The mean of 3, 2, 6, 5, and 4 is found by adding
3 + 2 + 6 + 5 + 4 = 20 and dividing by 5; hence,
the mean of the data is 20/5 = 4.
Measures of Central Tendency
• The
mean is the sum of the values, divided by the total
number of values. The symbol represents the sample
mean.
….. etc.
Measures of Central Tendency
Step 3: For each class, multiply the frequency by
the midpoint, as shown, and place the product in
column D.
Measures of Central Tendency
•Step
4: Find the sum of column D:
Solution
Measures of Central Tendency
Example
A small company consists of the owner, the
manager, the salesperson, and two technicians, all
of whose annual salaries are listed here. (Assume
that this is the entire population.)
• In a symmetric distribution,
the data values are evenly
distributed on both sides of
the mean. In addition,
when the distribution is
unimodal, the mean,
median, and mode are the
same and are at the center
of the distribution.
Measures of Central Tendency
The range
• The range is the highest value minus the lowest
value. The symbol R is used for the range.
R = highest value - lowest value
Measures of Variation
Example
• The salaries for the staff of the XYZ
Manufacturing Co. are shown here. Find the
range.
Solution
The range is R = $100,000 - $15,000 =$85,000.
Measures of Variation
Population Variance
• The variance is the average of the squares of the
distance each value is from the mean. The
symbol for the population variance is . The
formula for the population variance is
where
X = individual value
= population mean
N = population size
Measures of Variation
Example
• Find the variance and standard deviation for the
data set for brand B paint in Example 3.18
The months were:
35, 45, 30, 35, 40, 25
Solution:
Step1: find the mean (population mean)
Measures of Variation
Sample Variance
• The formula for the sample variance, denoted by
s2, is
where
X = individual value
= sample mean
n = sample size
Measures of Variation
where
X = individual value
= sample mean
n = sample size
Measures of Variation
Example
• Find the sample variance and standard deviation
for the amount of European auto sales for a
sample of 6 years shown. The data are in
millions of dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
Solution:
Step1: find the sum of values
Measures of Variation
•Step
2 Square each value and find the sum.
Measures of Variation
•Step
3 Substitute in the formulas and solve.
Solution
Step 1: Make a table as shown, and find the
midpoint of each class.
Measures of Variation
•Step
2 Multiply the frequency by the midpoint for
each class, and place the products in column D.
•Step
4 Find the sums of columns B, D, and E. The
sum of column B is n, the sum of column D is and
the sum of column E is . The completed table is
shown.
Measures of Variation
•Step
5 Substitute in the formulas and solve.
• For test B,
•The
percentile corresponding to a given value X is
computed by using the following formula:
Measures of Position
Example
• A teacher gives a 20-point test to 10 students.
The scores are shown here. Find the percentile
rank of a score of 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
Solution
Arrange the data in order from lowest to highest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Measures of Position
• Then substitute into the formula.
Where
n= total number of values
p= percentile
Thus
Measures of Position
Step 3 If c is not a whole number, round it up to
the next whole number; in this case, c = 3. Start at
the lowest value and count over to the third value,
which is 5. Hence, the value 5 corresponds
to the 25th percentile.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Measures of Position
Example
• Using the data set in Example 3–32, find the
value that corresponds to the 60th percentile.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
Solution
Step 1 Arrange the data in order from lowest to
highest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Step 2 Compute
Measures of Position
• 3 If c is a whole number, use the value halfway
Step
between the c and c +1 values when counting up from the
lowest value—in this case, the 6th and 7th values.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
↑ ↑
6th value 7th value
The value halfway between 10 and 12 is 11. Find it by
adding the two values and dividing by 2.
Step 4 Find the median of the data values greater than 14.
15, 18, 22, 50
↑
Q3
•Solution
Step 1 Find Q1, MD, and Q3 for the real cheese
data.
40 45 90 180 220 240 310 420
↑ ↑ ↑
Q1 MD Q3
Exploratory Data Analysis
•Step
2 Find Q1, MD, and Q3 for the cheese
substitute data.
130 180 250 260 270 290 310 340
↑ ↑ ↑
Q1 MD Q3
Exploratory Data Analysis