Measures of Variation Include
Measures of Variation Include
As well as the Central Tendency of the data in a population or sample a second important characteristic of the data is it variability about some center.
Measures of Variation include: The range The Variance The Standard Deviation The Mean Absolute Deviation
The standard deviation is just the square root of the variance
Measures of Variation
Standard Deviation of a Population
Measures of Variation
Suppose a student receives the following quiz grades:
{82, 68, 74, 86, 90, 88, 62, 75, 80, 55}
For this student, these grades are the total population of her scores that are used to calculate her mean or average grade. We obtain: = (82 + 68 + 74 + 86 + 90 + 88 + 62 + 75 + 80 + 55)/10 = 760/10 = 76 The mean of this population is 76
Measures of Variation
Having obtained the mean, we can now calculate the variance {82, 68, 74, 86, 90, 88, 62, 75, 80, 55} and =76 2 = (xi )2/N
i
= {(82-76)2 + (68-76)2 + (74-76)2 + (86-76)2 + (90-76)2 + (88-76)2 + (62-76)2 + (75-76)2 + (80-76)2 + (55-76)2 }/10
Measures of Variation
We find the standard deviation in this population data by taking the square root of the variance. 2 = (xi )2/N = 119.8
i
= (119.8) = 10.94 If we display the data on a dot plot, we can visualize the use of the standard deviation as a measure of variation in the data {82, 68, 74, 86, 90, 88, 62, 75, 80, 55}
x 55 60 x 65 x 70
= 76
x 80 x x 85 x x 90 95 100
xx 75
Mean = 76
Measures of Variation
Chebyshevs Theorem
The proportion of any set of data lying within K standard deviations of the mean is always at least 1 1/K2, for all K greater than or equal to 2.
Chebyshevs Inequality tells us that in any statistical distribution at least of the values will lie within 2 standard deviations of the mean, and at least 8/9 of all values will lie within 3 standard deviations of the mean.
Measures of Variation
The Sample Standard Deviation The standard deviation of a sample is denoted by the letter s. The sample standard deviation is an estimate of the _ population standard deviation
s2 = i(xi x)2/(n 1)
Where x bar in the previous formula denotes the sample mean. The sample standard deviation is obtained by taking the square root of the variance. Note! To calculate the sample variance we divide by the number of degrees of freedom (n 1) instead of the sample size n. We have already calculated the sample mean when we use the same sample data to obtain a second statistic. Only n-1 of those values are considered free the nth value is fixed since the sum must equal n times the mean.
Measures of Variation
The formula for the standard deviation can be transformed into a form that slightly simplifies the computation.
Measures of Variation
Using the original formula and treating the previous data a sample data with a mean of 76 we get:
{82, 68, 74, 86, 90, 88, 62, 75, 80, 55}
s = (((82-76)2 + (68-76)2 + (74-76)2 + (86-76)2 + (90-76)2 + (88-76)2 + (62-76)2 + (75-76)2 + (80-76)2 + (55-76)2)/(n-1)) = (1198/9) = 133.11 = 11.54
Measures of Variation
To use the modified formula, we first construct the following table {82, 68, 74, 86, 90, 88, 62, 75, 80, 55} x 82 68 74 86 90 88 62 75 80 55 760 x2 6724 4724 5476 7396 8100 7744 3844 5625 6400 3025 58958 n = 10
s2 = ((10)(58958)-7602)/(10)(9)
= (589580-577600)/(10)(9)
= 133.11
s = 133.11 = 11.54
In this second method we find the total of the sample items and the total of the square of each of these items.
Measures of Variation
Finding the standard deviation for tabulated or weighted data
Recall the table we constructed for finding the mean of a sample of September temperature readings in the Central Tendency lecture notes.
Class
64.5 - 69.5 69.5 74.5 74.5 79.5 79.5 84.5 84.5 89.5 89.5 94.5
f*x
402 792 1540 1066 783
x2
4489 5184 5929 6724 7569
f*x2
26934 57024 118580 87412 68121
1 92 8464 8464 60 4675 366535 We have augmented the previous table by adding two additional columns that will be used for calculating the sample standard deviation of these grouped data.
Measures of Variation
The formula for obtaining the standard deviation of weighted or tabulated data is:
Measures of Variation
We construct an ogive from the previous table
frequency Mean = 79.183 60 55 50 45
x x x
s = 6.21 2s = 12.42
40
35 30 25 20 15 10 5 0 x 64.5
x
6.21 2s
x
6.21 2s
69.5
74.5
79.5
84.5
89.5
94.5
Temperature
Measures of Variation
The Normal Distribution Continuous
Symmetric
Mean = Median = Mode (all the same value)
mean
o o o o o o o o o o o o o o o
o
o o o o o o o
Measures of Variation
Other measures of variation Using the range to estimate the standard deviation
s ~ range/4 On an earlier slide we found for a population of student grades: {82, 68, 74, 86, 90, 88, 62, 75, 80, 55} = 76 and = 10.94
The range of this population = 90 55 = 35 This gives us an estimate of = 35/4 = 8.75 In the tabulated data for the temp readings we have range = 92 65 = 27 s = 27/4 = 6.15 which agrees fairly well with the calculated value of s = 6.21
Measures of Variation
The Coefficient of Variation (CV)
Define: For either a population or a sample the Coefficient of Variation is defined to be the ratio of the standard deviation over the mean CV = / for a population CV = s/ x for a sample
Where x denotes x bar the sample mean
Part 2
Relative Standing
A z score is the number of standard deviations that a raw score, x, is above or below the mean. A raw score x taken from a population is converted to a standardized z score by the formula z = (x )/ In a sample the z score of a value x is given by z = (x x)/s where x denotes the sample mean
Relative Standing
Percentiles
percentile of value x = ((number of values < x)/ total number of values)*100
(round the result to the nearest whole number
Suppose that in a class of 25 people we have the following averages (ordered in ascending order)
42, 59, 63, 67, 69, 69, 70, 73, 73, 74, 74, 74, 77, 78, 78, 79, 80, 81, 84, 85, 87, 89, 91, 94, 98
Relative Standing
Quartiles
Instead of finding the percentile of a single data value as we did on the previous page, it is often useful to group the data into 4, or more, (nearly) equal groups. When grouping the data into four equal groupings, we call these groupings quartiles.
Let n = number of items in the data set k = percent desired (ex. k= 25) L = locator the value separating the first k percent of the data from the rest
L = (k/100) * n
Relative Standing
Lets separate the 25 class grades into four quartiles.
Relative Standing
Other measures of relative standing include
Interquartile range (IQR) = Q3 - Q1 Semi-interquartile range = (Q3 - Q1)/ 2 Midquartile = (Q3 + Q1)/2
IQR = 84 70 = 16
Semi IQR = (84 70)/2 = 8 Midquartile = (84 + 70)/2 = 77
Box Diagram
Recall the ordered high temperature readings from an previous lecture 65, 67, 68, 68, 69, 69, 71, 71, 71, 72, 72, 72, 73, 73, 73, L25 74, 74, 75, 75, 75, 75, 76, 76, 77, 77, 77, 77, 77, 77, 78, median 78, 78, 78, 79, 79, 79, 79, 80, 81, 81, 81, 81, 81, 81, 81, L75 81, 82, 82, 83, 84, 85, 85, 85, 86, 86, 87, 87, 88, 89, 92
To construct a box diagram to illustrate the extent to which the extreme data values lie beyond the interquartile range, draw a line with the low and high value highlighted at the two ends. Mark the gradations between these two extremes, then locate the quartile boundaries Q1, Med., and Q3 on this line. Construct a box about these values. Q1 = (73 + 74)/2 = 73.5
Q1 M Q3
65
69
73
77
81
85
89
92