Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Lecture4_slides

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lecture4_slides

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Probability and Statistics (PHM111s)-Lecture 4

Part I: Introduction to Statistical Methods.

Part II: Methods of Descriptive Statistics.


1- Collecting Data.
2- Organizing Data.
3- Presenting Data.
4- Summarizing Data.

Part III: Introduction to Probability.

Part IV: Methods of Inferential Statistics.


4. Summarizing Data (cont.)
Example: Imagine two sets of 5 students each and an exam of maximum mark 50 marks is given for each set, the
marks of the students were as follow:

The set A: 29, 26, 35, 35, 35

The set B: 8, 35, 49, 35, 33

Mean Median Mode


Set A 32 35 35
Set B 32 35 35

Descriptive
Statistics

Ungrouped Grouped
Data Data

Measures of Measures of Measures of


Central Variation Position
Tendency
1- Range:
Example 1: The salaries for the staff of the XYZ Manufacturing Co. are shown here. Find the range.
Staff Salary
Owner $100,000
Manager 40,000
Sales representative 30,000
Workers 25,000
15,000
18,000
Solution
The range is R = $100,000 - $15,000 = $85,000.

Example: The range of set X: 55, 53, 57, 56 and 54 = 57-53 = 4


The range of set Y: 67, 73, 41, 60 and 34 = 73-34 = 39

2- A- Population Variance and Standard Deviation


The variance is the average of the squares of the distance each value is from the mean.
The symbol for the population variance is σ .
2

σ =∑2( X − µ) 2

N
The standard deviation is the square root of the variance.

σ
= σ
=
2 ∑ ( X − µ) 2

N
Example 2:
Find the variance and standard deviation for brand A paint.
10, 60, 50, 30, 40, 20
Solution: µ
= ∑=
X 10 + 60 + 50 + 30 + 40 + 20 210
= = 35
N 6 6
A B C
Values X X −µ ( X − µ )2

10 -25 625
60 +25 625
50 +15 225
30 -5 25
40 +5 25
20 -15 225

1750
Variance = 1750 ÷ 6 = 291.7
Standard deviation equals 291.7 , or 17.1.
B- Sample Variance and Standard Deviation

Case 1: Ungrouped Data

The formula for the sample variance, denoted by s2, is

=∑
(X − X ) 2

s 2

n −1
The symbol for the sample standard deviation is s .

=s s2
= ∑(X − X ) 2

n −1
The shortcut formulas for computing the variance and standard deviation for data obtained from samples are as
follows:
Variance Standard deviation
n( ∑ X 2 ) − ( ∑ X ) 2 n( ∑ X 2 ) − ( ∑ X ) 2
s =
2
s=
n(n − 1) n(n − 1)
Example 3: Find the sample variance and standard deviation for the amount of European auto sales for a sample of
6 years shown. The data are in millions of dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
Solution
∑ X = 11.2 + 11.9 + 12.0 + 12.8 + 13.4 + 14.3 = 75.6
∑X = 11.22 + 11.92 + 12.02 + 12.82 + 13.42 + 14.32 = 958.94
2

n( ∑ X 2 ) − ( ∑ X ) 2
s =
2

n(n − 1)
6(958.94) − 75.62
=
6(6 − 1)
38.28
= = 1.276
30
=s = 1.28 1.13

Case 2: Grouped Data


n( ∑ f . X m 2 ) − ( ∑ f . X m ) 2
s =2

n(n − 1)
Example 4 Find the sample variance and the standard deviation for the frequency distribution of the following
data:

Class Frequency
5.5–10.5 1
10.5–15.5 2
15.5–20.5 3
20.5–25.5 5
25.5–30.5 4
30.5–35.5 3
35.5–40.5 2
Solution
A B C D E
Class Frequency Midpoint f .X m f . X m2

5.5–10.5 1 8 8 64
10.5–15.5 2 13 26 338
15.5–20.5 3 18 54 972
20.5–25.5 5 23 115 2,645
25.5–30.5 4 28 112 3,136
30.5–35.5 3 33 99 3,267
35.5–40.5 2 38 76 2,888

n = 20 ∑ f .X m
= 490 ∑ f .X 2
m
= 13,310

n( ∑ f . X m 2 ) − ( ∑ f . X m ) 2
s = 2

n(n − 1)
20(13,310) − 4902
=
20(20 − 1)
266,200 − 240,100
=
20(19)
26,100
= = 68.7
380
=s = 68.7 8.3
3- Coefficient of Variation

For samples, For populations,


s σ
CVar = .100% CVar = .100%
X µ

Example 5 The mean of the number of sales of cars over a 3-month period is 87, and the standard deviation is 5.
The mean of the commissions is $5225, and the standard deviation is $773. Compare the variations of
the two.

Solution
The coefficients of variation are
s 5
= CVar = .100% = .100% 5.7% sales
X 87
s 773
= CVar = .100% = .100% 14.8% commissions
X 5225

Since the coefficient of variation is larger for commissions, the commissions are more variable than the sales.

Example
X s
Height 65′′ 3′′
Weight 175 lbs 4 lbs
Solution

The coefficients of variation are


s 3
(CVar)
= = .100% = .100% 4.6%
65
H
X
s 4
(CVar)
= = .100% =.100% 2.3%
175
W
X
Since the coefficient of variation is larger for Height, the Height is more variable than the Weight (i.e. more
spread).
3-3 Measures of Position

Descriptive
Statistics

Measures of Central Measures of Measures of


Tendency Variation Position

Standard Scores

value − mean
z=
standard deviation
For samples, the formula is
X−X
z=
s

For populations, the formula is


X −µ
z=
σ

The z score represents the number of standard deviations that a data value falls above or below the mean.
Example 4–6: A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative
positions on the two tests.
Solution
First, find the z scores. For calculus the z score is
X − X 65 − 50
= z = = 1.5
s 10
For history the z score is
30 − 25
= z = 1.0
5
Since the z score for calculus is larger, her relative position in the calculus class is higher than her relative position
in the history class.

When all data for a variable are transformed into z scores, the resulting distribution will have a mean of 0 and a
standard deviation of 1. A z score, then, is actually the number of standard deviations each value is from the mean
for a specific distribution.
Empirical Rule:

“Usual”

“Unusual” then “very rare”


Example
Population 1 Population 2
X (individual value) 76′′ 86′′
µ (population mean) 71.5′′ 80′′
σ (population S.D.) 2.1′′ 3.3′′
X −µ
z= 2.14 (unusual) 1.82 (usual)
σ
Percentiles
Percentiles divide the data set into 100 equal groups.
Percentile Formula
The percentile corresponding to a given value X is computed by using the following formula:

(number of values below X ) + 0.5


Percentile rank = .100%
total number of values
n. p
and the order of the value corresponding to certain percentile is c =
100
where
n = total number of values
p = percentile rank
Example 4–7: A teacher gives a 20-point test to 10 students. The scores are shown here. Find the percentile rank
of a score of 12. Also find the value corresponding to the 25th percentile.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
Solution:
Arrange the data in order from lowest to highest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Then substitute into the formula.
(number of values below X ) + 0.5
Percentile rank = .100%
total number of values

Since there are six values below a score of 12, the solution is

6 + 0.5
Percentile
= rank = .100% 65th percentile
10
10 . 25
and c= = 2.5 ⇒ c= 3 (third order)
100
Hence, the value 5 corresponds to the 25th percentile.

(Note: If c is not a whole number, round it up to the next whole number as in this example.)
Thus, a student whose score was 12 did better than 65% of the class.
Example 4–8: Using the data set in the previous Example, find the value that corresponds to the 60th percentile.

Solution
Arrange the data in order from smallest to largest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Substitute in the formula.
n. p 10.60
= c = = 6
100 100

If c is a whole number, use the value halfway between the c and c +1 values when counting up from the lowest
value. In this case, the 6th and 7th values.

2, 3, 5, 6, 8, 10, 12, 15, 18, 20

6th value 7th value

The value halfway between 10 and 12 is 11. Find it by adding the two values and dividing by 2.

10 + 12
= 11
2

Hence, 11 corresponds to the 60th percentile. Anyone scoring 11 would have done better than 60% of the class.

percentile growth chart


Quartiles and Deciles
Case 1: Ungrouped Data
Quartiles divide the distribution into four groups, separated by Q1, Q2, Q3.

Example 4–9: Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.
Solution
Step 1 Arrange the data in order.
5, 6, 12, 13, 15, 18, 22, 50
Step 2 Find the median (Q2).
5, 6, 12, 13, 15, 18, 22, 50

MD
13 + 15
Q= =MD = 14
2 2
Step 3 Find the median of the data values less than 14.
5, 6, 12, 13

Q1
6 + 12
Q
= = 9
1 2
So Q1 is 9.
Step 4 Find the median of the data values greater than 14.
15, 18, 22, 50

Q3
18 + 22
=Q = 20
3 2
Here Q3 is 20. Hence, Q1 = 9, Q2 = 14, and Q3 = 20.
Interquartile range (IQR)
IQR = Q3 - Q1

Midhinge:
Q +Q
Midhinge = 1 3
2
Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2, etc.

Case 2: Grouped Data

Using the same method of calculations as in the median, we can get Q1 and Q3 equation as follows:

 n   3n 
 −F   4 −F 
4
Q L +
=
1 Q1 i , Q
=
3
L +
Q3 i
 fQ   fQ3 
 1   

Example 4–10: Based on the grouped data below, find the interquartile range (IQR).
Time to travel to work f

1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Solution

Construct the cumulative frequency distribution

Height (in cm) f cf


1 – 10 8 8
11 – 20 14 22
21 – 30 12 34
31 – 40 9 43
41 – 50 7 50

n 50
Class Q1= = = 12.5 → class Q1 is the 2nd class
4 4
n 
 4−F 
Q L +
= i
1 Q1 f
 Q1 
 
 12.5−8 
10.5 + 
=  10 =
13.7143
 14 

3n 3(50)
Class Q=
3 = = 37.5 → class Q3 is the 4th class
4 4
 3n 
 4 −F 
Q
= L + i
3 Q3
 fQ3 
 
 37.5− 34 
= 30.5 +   10 =
34.3889
 9 
IQR = Q3 - Q1 = 34.3889 - 13.7143 = 20.6746
Outliers

An outlier is an extremely high or an extremely low data value when compared with the rest of the data values.

Example 4–11: Check the following data set for outliers.


5, 6, 12, 13, 15, 18, 22, 50

Solution

The data value 50 is extremely suspect. These are the steps in checking for an outlier.
Step 1 Find Q1 and Q3. From the previous example, Q1 is 9 and Q3 is 20.
Step 2 Find the interquartile range (IQR), which is Q3 - Q1.
IQR = Q3 - Q1 = 20 - 9 = 11

Step 3 Multiply this value by 1.5.


1.5(11) = 16.5

Step 4 Subtract the value obtained in step 3 from Q1, and add the value obtained in step 3 to Q3.
9 - 16.5 = -7.5 and 20 + 16.5 = 36.5

Step 5 Check the data set for any data values that fall outside the interval from -7.5 to 36.5. The value 50 is outside
this interval; hence, it can be considered an outlier.
3–4 Exploratory Data Analysis

A boxplot is a graph of a data set obtained by drawing a horizontal line from the minimum data value to Q1,
drawing a horizontal line from Q3 to the maximum data value, and drawing a box whose vertical sides pass through
Q1 and Q3 with a vertical line inside the box passing through the median or Q2.

Example 4–12: The number of meteorites found in 10 states of the United States is 89, 47, 164, 296, 30, 215, 138,
78, 48, 39. Construct a boxplot for the data.
Solution
Step 1 Arrange the data in order:
30, 39, 47, 48, 78, 89, 138, 164, 215, 296
Step 2 Find the median.
30, 39, 47, 48, 78, 89, 138, 164, 215, 296

Median
78 + 89
Q= =MD = 83.5
2 2
Step 3 Find Q1.
30, 39, 47, 48, 78

Q1
Step 4 Find Q3.
89, 138, 164, 215, 296

Q3
Step 5 Draw a scale for the data on the x axis.
Step 6 Located the lowest value, Q1, median, Q3, and the highest value on the scale.
Step 7 Draw a box. Fi–

You might also like