Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

U3 IntroSummaryStatistics

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 47

Introduction to Summary Statistics

Statistics
• The collection, evaluation, and interpretation of
data

• Statistical analysis of measurements can help


verify the quality of a design or process
Summary Statistics
Central Tendency
• “Center” of a distribution
– Mean, median, mode
Variation
• Spread of values around the center
– Range, standard deviation, interquartile range
Distribution
• Summary of the frequency of values
– Frequency tables, histograms, normal distribution
Mean Central Tendency

• The mean is the sum of the values of a set


of data divided by the number of values in
that data set.

μ =
∑ xi
N
Mean Central Tendency

μ =
∑ xi
N
= mean value
xi = individual data value
= summation of all data values
N = # of data values in the data set
Mean Central Tendency

• Data Set
3 7 12 17 21 21 23 27 32 36 44
• Sum of the values = 243
• Number of values = 11

Mean = μ =
∑ xi = 243 = 22.09
N 11
Mode Central Tendency

• Measure of central tendency


• The most frequently occurring value in a
set of data is the mode
• Symbol is M

Data Set:
27 17 12 7 21 44 23 3 36 32 21
Mode Central Tendency

• The most frequently occurring value in a


set of data is the mode

Data Set:
3 7 12 17 21 21 23 27 32 36 44

Mode = M = 21
Mode Central Tendency

• The most frequently occurring value in a


set of data is the mode.
• Bimodal Data Set: Two numbers of equal
frequency stand out
• Multimodal Data Set: If more than two
numbers of equal frequency stand out
Mode Central Tendency

Determine the mode of


48, 63, 62, 49, 58, 2, 63, 5, 60, 59, 55
Mode = 63
Determine the mode of
48, 63, 62, 59, 58, 2, 63, 5, 60, 59, 55
Mode = 63 & 59 Bimodal
Determine the mode of
48, 63, 62, 59, 48, 2, 63, 5, 60, 59, 55
Mode = 63, 59, & 48 Multimodal
Median Central Tendency

• Measure of central tendency


• The median is the value that occurs in the
middle of a set of data that has been
arranged in numerical order
• Symbol is ~x, pronounced “x-tilde”
Median Central Tendency

• The median is the value that occurs in the


middle of a set of data that has been
arranged in numerical order.

Data Set:
27 17 12 7 21 44 23 3 36 32 21
Median Central Tendency

• A data set that contains an odd number of


values always has a Median.

Data Set:
3 7 12 17 21 21 23 27 32 36 44
Median Central Tendency

• For a data set that contains an even number of


values, the two middle values are averaged with
the result being the Median.
Data Set:
3 7 12 17 21 21 23 27 31 32 36 44
Range Variation

• Measure of data variation.


• The range is the difference between the
largest and smallest values that occur in a
set of data.
• Symbol is R

Data Set:
3 7 12 17 21 21 23 27 32 36 44
Range = R = 44 – 3 = 41
Standard Deviation Variation

• Measure of data variation.


• The standard deviation is a measure of
the spread of data values.
– A larger standard deviation indicates a wider
spread in data values
Standard Deviation Variation

√ ∑ ( xi − 2
μ)
σ=
N

σ = standard deviation
xi = individual data value ( x1, x2, x3, …)
μ = mean
N = size of population
Standard Deviation Variation

√ ∑ 2
Procedure: ( xi − μ )
σ=
1. Calculate the mean, μ. N
2. Subtract the mean from each value and
then square each difference.
3. Sum all squared differences.
4. Divide the summation by the size of the
population (number of data values), N.
5. Calculate the square root of the result.
√ ∑ ( xi − 2
Standard Deviation μ)
σ=
Calculate the standard N
deviation for the data array
2, 5, 48, 49, 55, 58, 59, 60, 62, 63, 63
1. Calculate the mean. μ =
∑ x i 524 47.64
N 11
2. Subtract the mean from each data value and square each
2
(
difference. x i − μ )
(2 - 47.64)2 = 2083.01 (59 - 47.64)2 = 129.05
(5 - 47.64)2 = 1818.17 (60 - 47.64)2 = 152.77
(48 - 47.64)2 = 0.13 (62 - 47.64)2 = 206.21
(49 - 47.64)2 = 1.85 (63 - 47.64)2 = 235.93
(55 - 47.64)2 = 54.17 (63 - 47.64)2 = 235.93
(58 - 47.64)2 = 107.33
Standard Deviation Variation

3. Sum all squared differences.


=
2083.01 + 1818.17 + 0.13 + 1.85 + 54.17 +
107.33 + 129.05 + 152.77 + 206.21 +
235.93 + 235.93
= 5,024.55
4. Divide the summation by the number of data values.
= = 456.78

5. Calculate the square root of the result.

√ ∑ ( x i − μ ) = √ 456.78 = 21.4
2

N
A Note about Standard Deviation
• Two distinct calculations
– Population Standard Deviation
• The measure of the spread of data within a
population.
• Used when you have a data value for every
member of the entire population of interest.
– Sample Standard Deviation
• An estimate of the spread of data within a larger
population.
• Used when you do not have a data value for every
member of the entire population of interest.
• Uses a subset (sample) of the data to generalize
the results to the larger population.
A Note about Standard Deviation
Population Sample
Standard Deviation Standard Deviation

√ ∑ ( xi −
√ ∑ ( xi −
2 2
μ) x)
σ= s=
N n −1

σ = population standard deviation s = sample standard deviation


xi = individual data value ( x1, x2, x3, …) xi = individual data value ( x1, x2, x3, …)
μ = population mean = sample mean
N = size of population n = size of sample
Sample Standard Deviation Variation

Procedure:
1. Calculate the sample mean,.
s=
√ ∑ ( xi −
n −1
x)

2. Subtract the mean from each value and


then square each difference.
3. Sum all squared differences.
4. Divide the summation by the number of
data values minus one, n - 1.
5. Calculate the square root of the result.
Sample Mean Central Tendency

x =
∑ xi l
e s
th n a
ly tio n
ia la ea
n s
e
t
n cu m
s al n
E e c atio
m u l
= sample mean sa pop

xi = individual data value


= summation of all data values
n = # of data values in the sample
Sample Standard Deviation
a= n − 1
Estimate the standard deviation for σ
population for which the following data
√ ∑ ( xi − x)

is a sample.
2, 5, 48, 49, 55, 58, 59, 60, 62, 63, 63
∑ x i 524
1. Calculate the sample mean. x =
n 11
47.64
2. Subtract the sample mean from each data value and
2
(
square the difference. x i − x )
(2 - 47.64)2 = 2083.01 (59 - 47.64)2 = 129.05
(5 - 47.64)2 = 1818.17 (60 - 47.64)2 = 152.77
(48 - 47.64)2 = 0.13 (62 - 47.64)2 = 206.21
(49 - 47.64)2 = 1.85 (63 - 47.64)2 = 235.93
(55 - 47.64)2 = 54.17 (63 - 47.64)2 = 235.93
(58 - 47.64)2 = 107.33
Sample Standard Deviation Variation

3. Sum all squared differences.


=
2083.01 + 1818.17 + 0.13 + 1.85 + 54.17 +
107.33 + 129.05 + 152.77 + 206.21 +
235.93 + 235.93
= 5,024.55
4. Divide the summation by the number of sample data values
minus one. = = 502.46

5. Calculate the square root of the result.

√ ∑ ( x i − x ) = √ 502.46 = 22.4
2

n −1
A Note about Standard Deviation
Population Sample
Standard Deviation Standard Deviation

√ ∑ ( xi −
√ ∑ ( xi −
2 2
μ) x)
σ= s=
N n −1

σ = population standard deviation s = sample standard deviation


xi = individual data value ( x1, x2, x3, …) xi = individual data value ( x1, x2, x3, …)
μ = population mean = sample mean
N = size of population n = size of sample

As n → N, s → σ
A Note about Standard Deviation
Population Sample
Standard Deviation Standard Deviation
Given the ACT score of

√ ∑ ( xi −
√ ∑
2 2
μ) ( xi − x )
every student in your
σ= s= class, use the
N n − 1 standard
population
deviation formula to find
σ = population standard deviation s = the standard
sample deviation of
standard deviation
xi = individual data value ( x1, x2, x3, …) ACT
xi = individual data scores
value ( x , x , x , …)
1 2 3

μ = population mean = sample meanin the class.


N = size of population n = size of sample
A Note about Standard Deviation
Population Sample
Standard Deviation
Given the ACT scores of
Standard Deviation

√ √
every student in your

class, use the( x
σ = deviation formula
i−
sample μ)
2

s=
∑ ( xi − x)
2

standard N n −1
to estimate the standard
deviation of the ACT
scores
σ of allstandard
= population students at
deviation s = sample standard deviation
xi = individual
your data value ( x , x , x , …)
school. 1 2 3 xi = individual data value ( x1, x2, x3, …)
μ = population mean = sample mean
N = size of population n = size of sample
Histogram Distribution

• A histogram is a common data distribution


chart that is used to show the frequency
with which specific values, or values within
ranges, occur in a set of data.
• An engineer might use a histogram to
show the variation of a dimension that
exists among a group of parts that are
intended to be identical.
Histogram Distribution

• Large sets of data are often divided into


limited number of groups. These groups
are called class intervals.

-6 to -16 -5 to 5 6 to 16
Class Intervals
Histogram Distribution

• The number of data elements in each


class interval is shown by the frequency,
which occurs along the Y-axis of the graph
Frequency

-16 to -6 -5 to 5 6 to 16
Histogram Distribution

Example

1, 7, 15, 4, 8, 8, 5, 12, 10

1, 4, 5, 7, 8, 8, 10, 12,15
Frequency

1 to 5 6 to 10 11 to 15
Histogram Distribution

• The height of each bar in the chart


indicates the number of data elements, or
frequency of occurrence, within each
range
1, 4, 5, 7, 8, 8, 10,12,15
Frequency

1 to 5 6 to 10 11 to 15
Histogram Distribution

Cube Side Length


5

3
Frequency

0
7 45 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
Length (in.)

MINIMUM MAXIMUM
= 0.745 in. = 0.760 in.
Class Intervals
Dot Plot Distribution

0 3 -1 -3
3 2 1 0
-1 -1 2 1
0 1 -1 -2
1 2 1 0
-2 -4 0 0

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Dot Plot Distribution

0 3 -1 -3
3 2 1 0
-1 -1 2 1
0 1 -1 -2
1 2 1 0
-2 -4 0 0
Frequency

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Normal Distribution Distribution

“Is the data distribution normal?”


• Translation: Is the histogram/dot plot bell-
shaped?
– Does the greatest frequency of the data
values occur at about the mean value?
– Does the curve decrease on both sides away
from the mean?
– Is the curve symmetric about the mean?
Normal Distribution Distribution

Bell shaped curve


Frequency

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

Data Elements
Normal Distribution Distribution
Does the greatest frequency of the
data values occur at about the
mean value?
Mean Value
Frequency

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

Data Elements
Normal Distribution Distribution
Does the curve decrease
on both sides away from
the mean?
Mean Value
Frequency

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

Data Elements
Normal Distribution Distribution
Is the curve symmetric
about the mean?

Mean Value
Frequency

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

Data Elements
What if things are not equal?

Histogram Interpretation: Skewed (Non-Normal) Right


Normal Distribution Distribution

If the data are


normally
distributed:

• 68% of the observations fall within 1 standard deviation of


the mean.
• 95% of the observations fall within 2 standard deviations of
the mean.
• 99.7% of the observations fall within 3 standard deviations of
the mean.
Normal Distribution Example
Data from a
sample of a
larger
population

Mean = = 0.083
Standard Deviation = s = 1.77 (sample)
Normal Distribution Distribution

0.08 + 1.77
= 1.88
0.08 + - 1.77
= -1.69
68 %

s s
-1.77 +1.77

0.08
Data Elements
Normal Distribution Distribution

= 3.62
0.08 + -3.54

0.08 + 3.54
= - 3.46
95 %

2σ 2σ
- 3.54 + 3.54
0.08
Data Elements

You might also like