Basic 1

MEL761: Statistics for Decision Making
Measures of a Distribution Central Tendency Variability Examples
Dr S G Deshmukh
Mechanical Department Indian Institute of Technology
1
Learning Objectives
Distinguish between measures of central tendency, measures of variability, measures of shape, and measures of association. Understand the meanings of mean, median, mode, quartile, percentile, and range. Compute mean, median, mode, percentile, quartile, range, variance, standard deviation, and mean absolute deviation on ungrouped data. Differentiate between sample and population variance and standard deviation.
2
Learning Objectives -- Continued

Understand the meaning of standard deviation as it is applied by using the empirical rule and Chebyshevs theorem. Compute the mean, median, standard deviation, and variance on grouped data. Understand box and whisker plots, skewness, and kurtosis. Compute a coefficient of correlation and interpret it.
3
Measures of Central Tendency: Ungrouped Data

Measures of central tendency yield information about particular places or locations in a group of numbers. Common Measures of Location
Mode Median Mean Percentiles Quartiles
4
Mode
The most frequently occurring value in a data set Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio) Bimodal -- Data sets that have two modes Multimodal -- Data sets that contain more than two modes
5
Mode -- Example
The mode is 44. There are more 44s than any other value.
35 37 37 39 40 40 41 41 43 43 43 43 44 44 44 44 44 45 45 46 46 46 46 48
6
Median
Middle value in an ordered array of numbers. Applicable for ordinal, interval, and ratio data Not applicable for nominal data Unaffected by extremely large and extremely small values.
Median: Computational Procedure

First Procedure
Arrange the observations in an ordered array. If there is an odd number of terms, the median is the middle term of the ordered array. If there is an even number of terms, the median is the average of the middle two terms.
Second Procedure
The medians position in an ordered array is given by (n+1)/2.
Median: Example with an Odd Number of Terms

Ordered Array 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22 There are 17 terms in the ordered array. Position of median = (n+1)/2 = (17+1)/2 = 9 The median is the 9th term, 15. If the 22 is replaced by 100, the median is 15. If the 3 is replaced by -103, the median is 15.
9
Median: Example with an Even Number of Terms

Ordered Array 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 There are 16 terms in the ordered array. Position of median = (n+1)/2 = (16+1)/2 = 8.5 The median is between the 8th and 9th terms, 14.5. If the 21 is replaced by 100, the median is 14.5. If the 3 is replaced by -88, the median is 14.5.
10
Arithmetic Mean
Commonly called the mean is the average of a group of numbers Applicable for interval and ratio data Not applicable for nominal or ordinal data Affected by each value in the data set, including extreme values Computed by summing all values in the data set and dividing the sum by the number of values in the data set
11
Population Mean
X = X +X +X =
1 2
+ ... + X N
N N 24 + 13 + 19 + 26 + 11 = 5 93 = 5 = 18. 6
12
Sample Mean
X = X +X +X X=
1 2
+ ... + X n
n n 57 + 86 + 42 + 38 + 90 + 66 = 6 379 = 6 = 63.167
13
Percentiles
Measures of central tendency that divide a group of data into 100 parts At least n% of the data lie below the nth percentile, and at most (100 - n)% of the data lie above the nth percentile Example: 90th percentile indicates that at least 90% of the data lie below it, and at most 10% of the data lie above it The median and the 50th percentile have the same value. Applicable for ordinal, interval, and ratio data Not applicable for nominal data
14
Percentiles: Computational Procedure

Organize the data into an ascending ordered array. Calculate the P percentile location:
i = (n ) 10 0
Determine the percentiles location and its value. If i is a whole number, the percentile is the average of the values at the i and (i+1) positions. If i is not a whole number, the percentile is at the (i+1) position in the ordered array.
15
Percentiles: Example
Raw Data: 14, 12, 19, 23, 5, 13, 28, 17 Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28 Location of 3 0 30th percentile:
i= (8) =24 . 10 0
The location index, i, is not a whole number; i+1 = 2.4+1=3.4; the whole number portion is 3; the 30th percentile is at the 3rd location of the array; the 30th percentile is 13.
16
Quartiles
Measures of central tendency that divide a group of data into four subgroups Q1: 25% of the data set is below the first quartile Q2: 50% of the data set is below the second quartile Q3: 75% of the data set is below the third quartile Q1 is equal to the 25th percentile Q2 is located at 50th percentile and equals the median Q3 is equal to the 75th percentile Quartile values are not necessarily members of the data set
17
Quartiles
Q1
25% 25%
Q2
25%
Q3
25%
18
Quartiles: Example
Ordered array: 106, 109, 114, 116, 121, 122, 125, 129 2 5 19 +14 0 1 Q1
i = (8 =2 ) 10 0 Q= 1 2 =11.5 1
Q2: Q3:
5 0 i= ( )= 8 4 1 0 0 7 5 i= () 6 8= 1 0 0
1 +2 1 1 6 1 Q= 2 =1 .5 1 8 2 1 +5 2 1 2 2 Q 3 = =35 1. 2 2
19
Variability
No Variability in Cash Flow Mean Mean
Variability in Cash Flow
Mean Mean
20
Variability
Variability
No Variability
21
Measures of Variability: Ungrouped Data

Measures of variability describe the spread or the dispersion of a set of data. Common Measures of Variability
Range Interquartile Range Mean Absolute Deviation Variance Standard Deviation Z scores Coefficient of Variation
22
Range
The difference between the largest and the smallest values in a set of data Simple to compute 35 41 44 Ignores all data points except 37 41 the 44 two extremes 37 43 44 Example: Range 39 43 = 44 Largest - Smallest = 40 43 44 48 - 35 = 13
40 43 45 45 46 46 46 46 48
23
Interquartile Range
Range of values between the first and third quartiles Range of the middle half Less influenced by extremes
Ie n t e r q u a r t i l
R = a Q n g e 3Q 1
24
Deviation from the Mean

Data set: 5, 9, 16, 17, 18 Mean: X 65
=
N
=13
Deviations from the mean: -8, -4, 3, 4, 5

-8
0 5
-4
+3
10 15
+4
+5
20
25
Mean Absolute Deviation

Average of the absolute deviations from the mean
X X X
5 9 16 17 18 -8 -4 +3 +4 +5 0 +8 +4 +3 +4 +5 24
M . A. D. =
X N
24 = 5 = 4. 8
26
Population Variance
Average of the squared deviations from the arithmetic mean
X X( X )
5 9 16 17 18 -8 -4 +3 +4 +5 0 64 16 9 16 25 130
( X )
N
130 = 5 = 26 .0
27
Population Standard Deviation

Square root of the variance
X ) X ( X
2
( X)
N
5 9 16 17 18
-8 -4 +3 +4 +5 0
64 16 9 16 25 130
130 = 5 = 26 .0
= 26 .0 = 51 .
28
Sample Variance
Average of the squared deviations from the arithmetic mean
X ) XX X (X
2,398 1,844 1,539 1,311 7,092 625 71 -234 -462 0 390,625 5,041 54,756 213,444 663,866
n1 663 ,866 = 3 = 221 ,288 .67
( X X)
29
Sample Standard Deviation

Square root of the sample variance
XX ) XX X (
2,398 1,844 1,539 1,311 7,092 625 71 -234 -462 0 390,625 5,041 54,756 213,444 663,866
n 1 663 ,866 = 3 = 221 ,288 .67
( X X)
S= =
221 ,288 .67

30
= 470 .41
Uses of Standard Deviation

Indicator of financial risk Quality Control
construction of quality control charts process capability studies
Comparing populations
household incomes in two cities employee absenteeism at two plants
31
Standard Deviation as an Indicator of Financial Risk

Annualized Rate of Return Financial Security A B 15% 15% 3% 7%
32
Empirical Rule
Data are normally distributed (or approximately normal)
Distance from the Mean Percentage of Values Falling Within Distance
1 2 3
68 95 99.7
33
Chebyshevs Theorem
Applies to all distributions
1 P k <X k ) 2 ( < + 1 k fr k> o 1

34
Chebyshevs Theorem
Applies to all distributions
Number of Standard Deviations Distance from the Mean Minimum Proportion of Values Falling Within Distance
K=2 K=3 K=4
2 3 4
1-1/22 = 0.75 1-1/32 = 0.89 1-1/42 = 0.94

35
Coefficient of Variation
Ratio of the standard deviation to the mean, expressed as a percentage Measurement of relative dispersion
C.V .= (100)
36
Coefficient of Variation
( 100) CV . = .
1 1 1
1 = 29
1
2 = 84
= 4.6
CV . = .
2
= 10
2 2
( 100)
4.6 ( 100) = 29 = 1586 .
10 ( 100) = 84 = 1190 .
37
Measures of Central Tendency and Variability: Grouped Data

Measures of Central Tendency
Mean Median Mode
Measures of Variability
Variance Standard Deviation
38
Mean of Grouped Data

Weighted average of class midpoints Class frequencies are the weights
f M = f f M =
N f1M+ f 2M + f 3M + 1 2 3 + fM i i = f1+ f 2 + f 3+ + f i
39
Calculation of Grouped Mean

Class Interval Frequency Class Midpoint 20-under 30 6 25 30-under 40 18 35 40-under 50 11 45 50-under 60 11 55 60-under 70 3 65 70-under 80 1 75 50
f M2 1 5 =0 = . = 40 3 f 0 5
40
fM 150 630 495 605 195 75 2150
Median of Grouped Data

N cfp ( W) Median = L + 2 fmed Where: L = the lower limit of the median class cfp = cumulative frequency of class preceding the median class fmed = frequency of the median class W = width of the median class N = total of frequencies
41
Median of Grouped Data -- Example

Cumulative Class Interval Frequency Frequency 20-under 30 6 6 30-under 40 18 24 40-under 50 11 35 50-under 60 11 46 60-under 70 3 49 70-under 80 1 50 N = 50
N cfp ( W) Md = L + 2 fmed 50 24 ( 10) = 40 + 2 11 = 40.909
42
Mode of Grouped Data

Midpoint of the modal class Modal class has the greatest frequency 3 +4 0 0 Class Interval Frequency Me = o d =3 5 2 20-under 30 6
30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 18 11 11 3 1
43
Variance and Standard Deviation of Grouped Data

Population Sample
2
2
f ( M ) S =
2
f ( MX)
n 1
S=
44
Population Variance and Standard Deviation of Grouped Data

Class Interval
20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80
f
6 18 11 11 3 1 50
M
25 35 45 55 65 75
f M M
150 630 495 605 195 75 2150 -18 -8 2 12 22 32
( M )
324 64 4 144 484 1024
( M )
1944 1152 44 1584 1452 1024 7200
=
2
f ( 7200 M )
2
= 50
= 144
2 1 =2 = = 4 1 4
45
Measures of Shape
Skewness Kurtosis
Absence of symmetry Extreme values in one side of a distribution Peakedness of a distribution Leptokurtic: high and thin Mesokurtic: normal shape Platykurtic: flat and spread out
Box and Whisker Plots
Graphic display of a distribution Reveals skewness

46
Skewness
Negatively Skewed
Symmetric (Not Skewed)
Positively Skewed
47
Skewness..
The skewness of a distribution is measured by comparing the relative positions of the mean, median and mode. Distribution is symmetrical Mean = Median = Mode Distribution skewed right Median lies between mode and mean, and mode is less than mean Distribution skewed left Median lies between mode and mean, and mode is greater than mean
48
Skewness
Mean Median
Mode
Mean Median Mode
Mode Median
Mean
Negatively Skewed
Positively Skewed
49
Coefficient of Skewness
Summary measure for skewness
S=
3( Md )
If S < 0, the distribution is negatively skewed (skewed to the left). If S = 0, the distribution is symmetric (not skewed). If S > 0, the distribution is positively skewed (skewed to the right).
50
Coefficient of Skewness
M
S
= 23 = 26 = 12.3 = 3
d1 1
( M )
1
= 26 = 26 = 12.3 = 3
d2 2
3( 23 26) = 12.3 = 0.73
d1
( M )
2
= 29 = 26 = 12.3 = 3
d3 3
3( 26 26) = 12.3 =0
d2
3( 29 26) = 12.3 = + 0.73

51
M
3
d3
Kurtosis
Peakedness of a distribution
Leptokurtic: high and thin Mesokurtic: normal in shape Platykurtic: flat and spread out Leptokurtic Mesokurtic Platykurtic
52
Box and Whisker Plot

Five specific values are used:
Median, Q2 First quartile, Q1 Third quartile, Q3 Minimum value in the data set Maximum value in the data set
Inner Fences
Outer Fences
IQR = Q3 - Q1 Lower inner fence = Q1 - 1.5 IQR Upper inner fence = Q3 + 1.5 IQR Lower outer fence = Q1 - 3.0 IQR Upper outer fence = Q3 + 3.0 IQR
53
Box and Whisker Plot
Minimum
Q1
Q2
Q3
Maximum
54
Skewness: Box and Whisker Plots, and Coefficient of Skewness

S<0 S=0 S>0
Negatively Skewed
Positively Skewed
55
Pearson Product-Moment Correlation Coefficient

r= = SSXY
( SSX ) ( SSY )
( X X )( Y Y ) ( X X ) ( Y Y ) ( X )( Y ) XY n
2 2
X X 2 n
Y Y 2 n
1 r 1
56
Three Degrees of Correlation
r<0
r>0
r=0
57
Computation of r for an Example

1 2 3 4 5 6 7 8 9 10 11 12 Summations
Day
Interest X
7.43 7.48 8.00 7.75 7.60 7.63 7.68 7.67 7.59 8.07 8.03 8.00 92.93
Futures Index Y
221 222 226 225 224 223 223 226 226 235 233 241 2,725
55.205 55.950 64.000 60.063 57.760 58.217 58.982 58.829 57.608 65.125 64.481 64.000 720.220
X2
48,841 49,284 51,076 50,625 50,176 49,729 49,729 51,076 51,076 55,225 54,289 58,081 619,207
Y2
1,642.03 1,660.56 1,808.00 1,743.75 1,702.40 1,701.49 1,712.64 1,733.42 1,715.34 1,896.45 1,870.99 1,928.00 21,115.07
58
XY
Computation of r for the Example

r=
( X )( Y ) XY
X X 2 n
2 Y 2 Y n ( 92.93) ( 2725 ) ( 21,115.07 ) 12 2 92.93 ( 619 ,207 ) 2725 ( 720.22 ) 12 12 2
=.815
59
Scatter Plot and Correlation Matrix for the Example

245 240
Futures Index
235 230 225 220 7.40 7.60 7.80

Interest
8.00
8.20
Interest Interest Futures Index 0.815254 1
Futures Index 1
60

Basic 1

Uploaded by

Copyright:

Available Formats

Basic 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic 1

Uploaded by

Copyright:

Available Formats

MEL761: Statistics for Decision Making

Measures of a Distribution Central Tendency Variability Examples

Learning Objectives -- Continued

Measures of Central Tendency: Ungrouped Data

Median: Computational Procedure

Median: Example with an Odd Number of Terms

Median: Example with an Even Number of Terms

Percentiles: Computational Procedure

Variability in Cash Flow

Measures of Variability: Ungrouped Data

Deviation from the Mean

Deviations from the mean: -8, -4, 3, 4, 5

Mean Absolute Deviation

Population Standard Deviation

n1 663 ,866 = 3 = 221 ,288 .67

Sample Standard Deviation

n 1 663 ,866 = 3 = 221 ,288 .67

221 ,288 .67

Uses of Standard Deviation

Standard Deviation as an Indicator of Financial Risk

1 P k <X k ) 2 ( < + 1 k fr k> o 1

K=2 K=3 K=4

1-1/22 = 0.75 1-1/32 = 0.89 1-1/42 = 0.94

4.6 ( 100) = 29 = 1586 .

Measures of Central Tendency and Variability: Grouped Data

Mean of Grouped Data

Calculation of Grouped Mean

fM 150 630 495 605 195 75 2150

Median of Grouped Data

Median of Grouped Data -- Example

N cfp ( W) Md = L + 2 fmed 50 24 ( 10) = 40 + 2 11 = 40.909

Mode of Grouped Data

Variance and Standard Deviation of Grouped Data

Population Variance and Standard Deviation of Grouped Data

Box and Whisker Plots

Graphic display of a distribution Reveals skewness

Symmetric (Not Skewed)

Mean Median Mode

Symmetric (Not Skewed)

3( 23 26) = 12.3 = 0.73

3( 29 26) = 12.3 = + 0.73

Box and Whisker Plot

Box and Whisker Plot

Skewness: Box and Whisker Plots, and Coefficient of Skewness

Symmetric (Not Skewed)

Pearson Product-Moment Correlation Coefficient

Three Degrees of Correlation

Computation of r for an Example

Computation of r for the Example

2 Y 2 Y n ( 92.93) ( 2725 ) ( 21,115.07 ) 12 2 92.93 ( 619 ,207 ) 2725 ( 720.22 ) 12 12 2

Scatter Plot and Correlation Matrix for the Example

235 230 225 220 7.40 7.60 7.80

Interest Interest Futures Index 0.815254 1

You might also like