Lecture4_slides
Lecture4_slides
Descriptive
Statistics
Ungrouped Grouped
Data Data
σ =∑2( X − µ) 2
N
The standard deviation is the square root of the variance.
σ
= σ
=
2 ∑ ( X − µ) 2
N
Example 2:
Find the variance and standard deviation for brand A paint.
10, 60, 50, 30, 40, 20
Solution: µ
= ∑=
X 10 + 60 + 50 + 30 + 40 + 20 210
= = 35
N 6 6
A B C
Values X X −µ ( X − µ )2
10 -25 625
60 +25 625
50 +15 225
30 -5 25
40 +5 25
20 -15 225
1750
Variance = 1750 ÷ 6 = 291.7
Standard deviation equals 291.7 , or 17.1.
B- Sample Variance and Standard Deviation
=∑
(X − X ) 2
s 2
n −1
The symbol for the sample standard deviation is s .
=s s2
= ∑(X − X ) 2
n −1
The shortcut formulas for computing the variance and standard deviation for data obtained from samples are as
follows:
Variance Standard deviation
n( ∑ X 2 ) − ( ∑ X ) 2 n( ∑ X 2 ) − ( ∑ X ) 2
s =
2
s=
n(n − 1) n(n − 1)
Example 3: Find the sample variance and standard deviation for the amount of European auto sales for a sample of
6 years shown. The data are in millions of dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
Solution
∑ X = 11.2 + 11.9 + 12.0 + 12.8 + 13.4 + 14.3 = 75.6
∑X = 11.22 + 11.92 + 12.02 + 12.82 + 13.42 + 14.32 = 958.94
2
n( ∑ X 2 ) − ( ∑ X ) 2
s =
2
n(n − 1)
6(958.94) − 75.62
=
6(6 − 1)
38.28
= = 1.276
30
=s = 1.28 1.13
n(n − 1)
Example 4 Find the sample variance and the standard deviation for the frequency distribution of the following
data:
Class Frequency
5.5–10.5 1
10.5–15.5 2
15.5–20.5 3
20.5–25.5 5
25.5–30.5 4
30.5–35.5 3
35.5–40.5 2
Solution
A B C D E
Class Frequency Midpoint f .X m f . X m2
5.5–10.5 1 8 8 64
10.5–15.5 2 13 26 338
15.5–20.5 3 18 54 972
20.5–25.5 5 23 115 2,645
25.5–30.5 4 28 112 3,136
30.5–35.5 3 33 99 3,267
35.5–40.5 2 38 76 2,888
n = 20 ∑ f .X m
= 490 ∑ f .X 2
m
= 13,310
n( ∑ f . X m 2 ) − ( ∑ f . X m ) 2
s = 2
n(n − 1)
20(13,310) − 4902
=
20(20 − 1)
266,200 − 240,100
=
20(19)
26,100
= = 68.7
380
=s = 68.7 8.3
3- Coefficient of Variation
Example 5 The mean of the number of sales of cars over a 3-month period is 87, and the standard deviation is 5.
The mean of the commissions is $5225, and the standard deviation is $773. Compare the variations of
the two.
Solution
The coefficients of variation are
s 5
= CVar = .100% = .100% 5.7% sales
X 87
s 773
= CVar = .100% = .100% 14.8% commissions
X 5225
Since the coefficient of variation is larger for commissions, the commissions are more variable than the sales.
Example
X s
Height 65′′ 3′′
Weight 175 lbs 4 lbs
Solution
Descriptive
Statistics
Standard Scores
value − mean
z=
standard deviation
For samples, the formula is
X−X
z=
s
The z score represents the number of standard deviations that a data value falls above or below the mean.
Example 4–6: A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative
positions on the two tests.
Solution
First, find the z scores. For calculus the z score is
X − X 65 − 50
= z = = 1.5
s 10
For history the z score is
30 − 25
= z = 1.0
5
Since the z score for calculus is larger, her relative position in the calculus class is higher than her relative position
in the history class.
When all data for a variable are transformed into z scores, the resulting distribution will have a mean of 0 and a
standard deviation of 1. A z score, then, is actually the number of standard deviations each value is from the mean
for a specific distribution.
Empirical Rule:
“Usual”
Since there are six values below a score of 12, the solution is
6 + 0.5
Percentile
= rank = .100% 65th percentile
10
10 . 25
and c= = 2.5 ⇒ c= 3 (third order)
100
Hence, the value 5 corresponds to the 25th percentile.
(Note: If c is not a whole number, round it up to the next whole number as in this example.)
Thus, a student whose score was 12 did better than 65% of the class.
Example 4–8: Using the data set in the previous Example, find the value that corresponds to the 60th percentile.
Solution
Arrange the data in order from smallest to largest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Substitute in the formula.
n. p 10.60
= c = = 6
100 100
If c is a whole number, use the value halfway between the c and c +1 values when counting up from the lowest
value. In this case, the 6th and 7th values.
The value halfway between 10 and 12 is 11. Find it by adding the two values and dividing by 2.
10 + 12
= 11
2
Hence, 11 corresponds to the 60th percentile. Anyone scoring 11 would have done better than 60% of the class.
Example 4–9: Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.
Solution
Step 1 Arrange the data in order.
5, 6, 12, 13, 15, 18, 22, 50
Step 2 Find the median (Q2).
5, 6, 12, 13, 15, 18, 22, 50
↑
MD
13 + 15
Q= =MD = 14
2 2
Step 3 Find the median of the data values less than 14.
5, 6, 12, 13
↑
Q1
6 + 12
Q
= = 9
1 2
So Q1 is 9.
Step 4 Find the median of the data values greater than 14.
15, 18, 22, 50
↑
Q3
18 + 22
=Q = 20
3 2
Here Q3 is 20. Hence, Q1 = 9, Q2 = 14, and Q3 = 20.
Interquartile range (IQR)
IQR = Q3 - Q1
Midhinge:
Q +Q
Midhinge = 1 3
2
Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2, etc.
Using the same method of calculations as in the median, we can get Q1 and Q3 equation as follows:
n 3n
−F 4 −F
4
Q L +
=
1 Q1 i , Q
=
3
L +
Q3 i
fQ fQ3
1
Example 4–10: Based on the grouped data below, find the interquartile range (IQR).
Time to travel to work f
1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Solution
n 50
Class Q1= = = 12.5 → class Q1 is the 2nd class
4 4
n
4−F
Q L +
= i
1 Q1 f
Q1
12.5−8
10.5 +
= 10 =
13.7143
14
3n 3(50)
Class Q=
3 = = 37.5 → class Q3 is the 4th class
4 4
3n
4 −F
Q
= L + i
3 Q3
fQ3
37.5− 34
= 30.5 + 10 =
34.3889
9
IQR = Q3 - Q1 = 34.3889 - 13.7143 = 20.6746
Outliers
An outlier is an extremely high or an extremely low data value when compared with the rest of the data values.
Solution
The data value 50 is extremely suspect. These are the steps in checking for an outlier.
Step 1 Find Q1 and Q3. From the previous example, Q1 is 9 and Q3 is 20.
Step 2 Find the interquartile range (IQR), which is Q3 - Q1.
IQR = Q3 - Q1 = 20 - 9 = 11
Step 4 Subtract the value obtained in step 3 from Q1, and add the value obtained in step 3 to Q3.
9 - 16.5 = -7.5 and 20 + 16.5 = 36.5
Step 5 Check the data set for any data values that fall outside the interval from -7.5 to 36.5. The value 50 is outside
this interval; hence, it can be considered an outlier.
3–4 Exploratory Data Analysis
A boxplot is a graph of a data set obtained by drawing a horizontal line from the minimum data value to Q1,
drawing a horizontal line from Q3 to the maximum data value, and drawing a box whose vertical sides pass through
Q1 and Q3 with a vertical line inside the box passing through the median or Q2.
Example 4–12: The number of meteorites found in 10 states of the United States is 89, 47, 164, 296, 30, 215, 138,
78, 48, 39. Construct a boxplot for the data.
Solution
Step 1 Arrange the data in order:
30, 39, 47, 48, 78, 89, 138, 164, 215, 296
Step 2 Find the median.
30, 39, 47, 48, 78, 89, 138, 164, 215, 296
↑
Median
78 + 89
Q= =MD = 83.5
2 2
Step 3 Find Q1.
30, 39, 47, 48, 78
↑
Q1
Step 4 Find Q3.
89, 138, 164, 215, 296
↑
Q3
Step 5 Draw a scale for the data on the x axis.
Step 6 Located the lowest value, Q1, median, Q3, and the highest value on the scale.
Step 7 Draw a box. Fi–