Lecturer 3
Lecturer 3
Lecturer 3
Asmaa El-Toony
Chapter 3
Data Description (Summarizing)
The Mean
∑𝑁
𝑖=1 𝑋𝑖
𝜇= ,
𝑁
which is usually unknown, then we use the sample mean to estimate or
approximate it.
A B C D
Class Frequency f Midpoint Xm f · Xm
5.5–10.5 1 8 8
10.5–15.5 2 13 26
15.5–20.5 3 18 54
20.5–25.5 5 23 115
25.5–30.5 4 28 112
30.5–35.5 3 33 99
35.5–40.5 2 38 76
n = 20 ∑ f • Xm = 490
∑ 𝑓𝑋𝑚 490
𝑋̿ = = = 24.5
𝑛 20
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑋𝑛+1
2
When n = 12, then the median is the 6.5th observation, which is an observation
halfway between the 6th and 7th ordered observation.
Example 2.11: For the same random sample, the ordered observations will be
as: 23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Solution:
Since n = 10, then the median is the 5.5th observation, i.e.
M = (32+34)/2 = 33.
The Mode:
The mode of a set of values is that value which occurs with the highest
frequency.
If all values are different there is no mode and sometimes, there are more than
one mode.
Example 2.13: Find the mode of the signing bonuses of eight NFL players for a
specific year. The bonuses in millions of dollars are:
18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10
Solution:
Since $10 million occurred 3 times—a frequency larger than any other number,
the mode is $10 million.
Example 2.14: Find the mode for the number of branches that six banks have:
401, 344, 209, 201, 227, 353
Solution:
Since each value occurs only once, there is no mode.
Note: Do not say that the mode is zero. That would be incorrect, because in some
data, such as temperature, zero can be an actual value.
Example 2.15:
The data show the number of licensed nuclear reactors in the United States for a
recent 15-year period. Find the mode.
Solution:
Since the values 104 and 109 both occur 5 times, the modes are 104 and 109. The
data set is said to be bimodal.
Example 2.16:
The modal class is 20.5–25.5, since it has the largest frequency. Sometimes
the midpoint of the class is used rather than the boundaries; hence, the
mode could also be given as 23 miles per week.
Note:
1. If all the values are the same. there is no dispersion.
2. If all the values are different, there is a dispersion:
3.If the values close to each other, the amount of Dispersion small.
4) If the values are widely scattered, the Dispersion is greater
1.Range:
Range = Largest value- smallest value
Note: The range is not useful as a measure of the variation since it only considers
two of the values. (it is not good)
Example 2.17: 43, 66, 61, 64, 65, 38, 59, 57, 57, 50.
Range = 66 - 38=28
2.The Variance:
Example 2.18: A testing lab wishes to test two experimental brands of outdoor
paint to see how long each will last before fading. The testing lab makes 6
gallons of each paint to test. Since different chemical agents are added to each
group and only six cans are involved, these two groups constitute two small
populations. The results (in months) are shown. Find the variance and
standard deviation for the data set for brand A and B.
Statistical Methods DR. Asmaa El-Toony
Solution:
For Brand A:
Statistical Methods DR. Asmaa El-Toony
DEFINITION
The sample standard deviation, s, is defined as the positive square root of the
sample variance, 𝑆 2 , Thus, 𝑠 = √𝑠 2 .
Solution:
2 ∑5𝑖=1(𝑥𝑖 −𝑥)
̅̅̅2 (𝟑−𝟒)𝟐 +(𝟐−𝟒)𝟐 (𝟓−𝟒)𝟐 +(𝟏−𝟒)𝟐 +(𝟗−𝟒)𝟐
n=5, 𝑥̅ =4 , then 𝑠 = =
4 𝟒
𝟒𝟎
= = 10 (𝑢𝑛𝑖𝑡)2
𝟒
Statistical Methods DR. Asmaa El-Toony
𝒙𝒊 ̅
𝒙𝒊 − 𝒙 ̅ )2
(𝒙𝒊 − 𝒙
3 -1 1
2 -2 4
5 1 1
1 -3 9
9 5 25
∑ 𝒙𝒊 0 40
𝟒𝟎
𝑺𝟐 = = 𝟏𝟎
𝟒
Example 2.20: Calculate the variance and standard deviation of the following
sample:
2, 3, 3, 3, 4
Solution:
These can easily be obtained from the following type of tabulation:
x 2 3 3 3 4 ∑x = 15
𝑥2 4 9 9 9 16 ∑𝑥 2 =47,
Then we use
2
𝑛 2 ( ∑𝑛
𝑖=1 𝑥𝑖 )
∑𝑖=1 𝑥𝑖 −
S2 = 𝑛
𝑛−1
152
47 −
= 5 = 2 = 0.5
5−1 4
so, S = 0.71.
We assume that all values in a particular class interval are located at the
midpoint of that interval. for calculating 𝑥̅ and 𝑆 2 we need n, ∑ 𝑥 and ∑ 𝑥 2
𝑛 ∑ 𝑥𝑖 𝑓𝑖 ∑ 𝑥𝑖2 𝑓𝑖
= ∑ 𝑓𝑖
∑ 𝑥𝑖 𝑓𝑖
𝑥̅ =
∑ 𝑓𝑖
and
∑ x2i fi −(∑ xi fi )2 /n ∑ x2i fi −nx̅2
S2 = ∑ fi −1
or S2 = ∑ fi −1
Example 2.21:
Classes F x 𝑥𝑖 𝑓𝑖 𝑥𝑖2 𝑓
15-19 8 17 136 2312
20-24 16 22 352 7744
25-29 32 27 864 23328
30-34 28 32 896 28672
35-39 12 37 444 16428
40-44 4 42 168 7056
100 2860 85540
Statistical Methods DR. Asmaa El-Toony
∑ 𝑥𝑖 𝑓𝑖 2860
𝑥̅ = ∑ 𝑓𝑖
= = 28.6 year
100
𝑆 = √𝑆 2 = 6.1 year
6.1
𝐶. 𝑉 = ∗ 100(%) = 21.5% .
28.6
If we want to compare the variation of two variables we cannot use the variance or
the standard deviation because:
1. The variables might have different units.
2. The variables might have different means.
We need a measure of the relative variation that will not depend on either the
units or on how large the values are. This measure is the coefficient of variation
(C.V.) which is defined by:
Statistical Methods DR. Asmaa El-Toony
Example 2.22. Suppose two samples of human males yield the following data:
Sampe1 Sample2
Age 25-year-olds 11year-olds
Mean weight 145 pound 80 pound
Standard deviation 10 pound 10 pound
Solution:
c.v (Sample1)= (10/145)*100= 6.9
c.v (Sample2)= (10/80)*100= 12.5
Then age of 11-years old (sample2) is more variation