Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
15 views

Chapter One - Introduction

Uploaded by

mariasharaiyra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Chapter One - Introduction

Uploaded by

mariasharaiyra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 156

Engineering Statistics I

Second Semester 2023/2024

Lecturer: Dr Mohannad Jreissat

Department of Industrial Engineering


School of Engineering

Textbook:
Applied Statistics and Probability for Engineers, by D. Montgomery
and G. Runger, 6th edition, Wiley.

Email: Drjreissat@gmail.com
Office: IE
Chapter 1

Introduction to Statistics and Data Analysis


What Do Engineers Do?

An engineer
is someone who solves problems of interest
to society with the efficient application of
scientific principles by:
• Refining existing products
• Designing new products or processes

1-3
The engineering method

Engineering Problem solving

1-4
Probability and Statistics in
Engineering

1-5
What is Probability?

• Probability is
the measure of the likeliness that a random event will occur, or the
knowledge upon an underlying model in figuring out the chance that
different outcomes will occur.
• By definition, probability values are between 0 and 1.
• If we flip a fair coin 3 times,
what is the probability of obtaining 3 heads?
• If we throw a dice 2 times,
what is the probability that the sum of the faces is 10?

1-6
What is Statistics?
◼ Statistics is a tool to get information from data.

Data
Statistics Information
Probability
• Knowledge about the
• Facts (mostly numerical), population concerning
collected from a certain some particular facts
population
◼ Statistics is used because the underlying model that governs a
certain experiments is not known.
◼ All that available is a sample of some outcomes of the experiment.
◼ The sample is used to make inference about the probability model
that governs the experiment.
◼ So, a thorough understanding of probability is essential to
understand statistics.
1-7
Why do we need to use
Probability and statistics ?!

1-8
Experiments & Processes Are Not
Deterministic
• Statistical techniques are useful for describing and
understanding variability.
• By variability, we mean successive observations of a system or
phenomenon do not produce exactly the same result.
• Statistics gives us a framework for describing this variability
and for learning about potential sources of variability.

1-9
Models Can Also Reflect Uncertainty

• Probability models help quantify the risks involved in


statistical inference, that is, risks involved in decisions
made every day.
• Probability provides the framework for the study and
application of statistics.
•Probability concepts will be introduced in the next
lecture.

1 - 10
STATISTICS

90

80

70

60

50 A

40 B
C
30

20

10

0
Quality control:
1 2 3 4

• What is my rate of
Forecasting: defective products?
• How can I best manage
• Expectations for the Analysis of sales: my production?
future?
• How much do we sell, and • What is the best way to
• How will the stock sample?
when?
markets behave?? • Should we change or
sales strategy?
1 - 11
Statistics in Engineering

• As engineers perform experiments, they


collect data that can be used to explain
relationships better and to reveal information
about the quality of products and services
they provide.

1 - 12
Statistics: Basic Ideas
• Statistics is
the area of science that deals with collection, organization, analysis, and
interpretation of data to:
• Make decisions
• Solve problems e.g. Identify sources of variability
• Design products and processes
• It is the science of learning information from data.

• It also deals with methods and techniques that can be used to draw
conclusions about the characteristics of a large number of data points--
commonly called a population.
• By using a smaller subset of the entire data.

1 - 13
Probability: Basic Ideas
• Terminology:
– Trial
Each time you repeat an experiment
– Outcome
Result of an experiment
– Random experiment
One with random outcomes (cannot be predicted
exactly)
– Relative frequency
How many times a specific outcome occurs within
the entire experiment. 1 - 14
For Example…
• You work in a cell phone factory and are asked to
remove cell phones at random off of the assembly line
and turn it on and off.
- Each time you remove a cell phone and turn it on and off,
you are conducting a Random Experiment

- Each time you pick up a phone is a Trial and the result is


called an Outcome

- If you check 200 phones, and you find 5 bad phones, then
Relative Frequency of failure = 5/200 = 0.025
1 - 15
Overview: Statistical Inference,
Samples, Populations, and the
Role of Probability
Two Directions of Reasoning

Statistical inference is one type of reasoning.


1 - 17
Populations and Samples:

Population Sample = Observations


Some Unknown Parameters We calculate Some Statistics

Example: Example:
JU Students (Height Mean) 20 Students from JU (Sample
N=Population Size Mean)
n = Sample Size

1 - 18
• Let X1,X2,…,XN be the population values (in general, they are
unknown)
• Let x1,x2,…,xn be the sample values (these values are known)
• Statistics obtained from the sample are used to estimate
(approximate) the parameters of the population.
✓ Scientific Data
✓ Statistical Inference
(1) Estimation:
➢ Point Estimation
➢ Interval Estimation (Confidence Interval)
(2) Hypotheses Testing

1 - 19
Some Terminologies
◼ Data: result of observation that consists of information, in the form
of counts, measurements, or responses.
◼ Parameter: numerical description of a population characteristics.
◼ Statistic: numerical description of a sample characteristics.
◼ Population: the collection of all outcomes, counts, measurements,
or responses that are of interest.
◼ Sample: a subset of a population.
◼ Experiment: any process that generates a set of data.
◼ Sample space: the set of all possible outcomes of a statistical
experiment. It is represented by the symbol S.
◼ Element or member: each outcome in a sample space. Sometimes
simply called a sample point.
1 - 20
Hypothesis Tests

Hypothesis Test

• A statement about a process behavior value.


• Compared to a claim about another process value.
• Data is gathered to support or refute the claim.

One-sample hypothesis test:


• Example: Ford AVG mpg = 30 vs. AVG mpg < 30

Two-sample hypothesis test:


• Example: Ford AVG mpg - Chevy AVG mpg = 0 vs. > 0.

1 - 21
Example
A quality control engineer at an integrated circuit
manufacturing plant takes a sample of 100 RAM chips
from the assembly line and finds that 10 are defective.
The company can tolerate 5% defective production in
the long run. The quality control engineer has to
determine whether the long-run defective rate is within
the tolerable range.

• Population :
All possible Ram chips coming out of the manufacturing process.
• Sample :
The RAM chips taken from the assembly line. In this case, the sample
size is 100.
1 - 22
• Using probability, the engineer computes that the chance of obtaining
10 out of 100 defective chips is 0.0167 if the long–term defective rate is
5%. Similarly, he can compute that the chance of obtaining 10 or more
defective chips is 0.0282.

• These small probabilities suggest that if the process indeed has a


defective rate of 5% (or less), the particular sample collected by the
engineer would rarely occur. However, it did occur! Therefore, the
engineer determines that the process is very likely unacceptable. (Or,
equivalently, the engineer says that he rejects the hypothesis that the
process is acceptable at a certain confidence level)
• This is called Statistical inference

1 - 23
Fundamental relationship between probability and
inferential statistics
✓ For a statistical problem, the
sample along with inferential
statistics allow us to draw
conclusions about population.
✓ Problems in probability allow us
to draw conclusions about
characteristics of hypothetical
data taken from the population
based on known features of the
populations. 1 - 24
Sampling Procedures;
Collection of Data
Data Collection

• Observational study

– Observe the system

– Historical data

• Design of experiment

• Plays key role in engineering design

• The objective is to build a system model usually called


empirical models

1 - 26
Statistics

• Divided into :

– Descriptive Statistics

– Inferential Statistics

1 - 27
Branches of Statistics
◼ Descriptive statistics
is the branch of statistics that involves the organization, summarization, and display of
data when the population can be enumerated completely.

◼ Inferential statistics
is the branch of statistics that involves using a sample of a population to draw
conclusions about the whole population.
A basic tool in the study of inferential statistics is probability.

◼ Descriptive statistics:
There are 45 students in the Probability and Statistics class. Twenty are younger than
24 years old. 16 are older than 36 years old.
What can be concluded?
◼ Inferential statistics:
As many as 860 people in Amman were questioned. People who drive bicycles daily
have an average age of 31 years old. For people who drive a motorcycle, the average
age is 21.
What can be concluded? 1 - 28
Steps in Inferential Statistics
◼ Design the experiments and collect the data.

◼ Organize and arrange the data to aid understanding.

◼ Analyze the data and draw general conclusions from data.

◼ Estimate the present and predict the future.

◼ In conducting the steps mentioned above,


Statistics use the support of Probability, which can model chance mathematically
and enables calculations of chance in complicated cases.

1 - 29
Forms of Data Description

• Point summary
• Tabular format
• Graphical format
• Diagrams

1 - 30
Point Summary

• Central tendency measures


– Mean --- x =  xi/n
–Median --- Middle value
– Mode --- Most frequent value

1 - 31
Point Summary

• Variability measures
– Range = Max xi - Min xi
– Variance = V =  (xi – x )2/ n-1
– Standard deviation = S
S = Square root (V)
– Coefficient of variation = S/ x

1 - 32
Dot Diagram

• A diagram that has on the x-axis the points


plotted : Given the following grades of a class:
50, 23, 40, 90, 95, 10, 80, 50, 75, 55, 60, 40.

. .
.
.

0 50 100

1 - 33
Dot Diagram

• A diagram that has on the x-axis the points plotted : Given


the following grades of a class:
50, 23, 40, 90, 95, 10, 80, 50, 75, 55, 60, 40.

. .
.
.

0 50 100

1 - 34
Time Frequency Plot
Time Frequency Plot

15
14
13
12
11
y 10
9
8
7
6
5
0 10 20 30 40 50
Observation number

1 - 35
Time Frequency Plot
Time Frequency Plot

15
14
13
12
11
y 10
9
8
7
6
5
0 10 20 30 40 50
Observation number

1 - 36
Control
Control Charts
Chart
105
Upper control limit = 100.5
Concentration

95
x = 91.50

85

Lower control limit = 82.54

75
0 10 20 30
Observation number

1 - 37
Table: Data Example (A)

1 - 38
Figure: Corrosion results for Example (A)

1 - 39
Measures of Location:

The Sample Mean and Median


Definition 1.1: Sample Mean

1 - 41
Definition 1.2 : Sample Median

1 - 42
Figure: Sample mean as a centroid of the
with-nitrogen stem weight

1 - 43
1.3 Measures of Location (Central Tendency):

* The data (observations) often tend to be concentrated around the


center of the data.
* Some measures of location are the mean, mode, and median.
* These measures are considered as representatives (or typical
values) of the data. They are designed to give some quantitative
measures of where the center of the data is in the sample.

The Sample mean of the observations ( x ):


If x1 , x2 ,, xn are the sample values, then the sample mean is
n
 xi
x1 + x2 +  + xn i =1
x= = (unit)
n n

1 - 44
Example:
Suppose that the following sample represents the ages (in year)
of a sample of 3 men:

x 1 = 30, x 2 = 35, x 3 = 27 . (n = 3)

Then, the sample mean is:

30 + 35 + 27 92
x= = = 30.67
3 3
n
Note:  ( xi − x) = 0
i =1

1 - 45
Other Measures of Locations

• Trimmed Mean
✓ A trimmed mean is computed by “trimming away”
✓ a certain percent of both the largest and the smallest
set of values.
✓ For example, the 10% trimmed mean is found by
✓ eliminating the largest 10% and smallest 10%
✓ and computing the average of the remaining values.

1 - 46
For example, in the case of the stem weight data,
we would eliminate the largest and smallest since
the sample size is 10 for each sample.
➢ For the without-nitrogen group the 10% trimmed mean is given by

➢ For the with-nitrogen group the 10% trimmed mean is given by

1 - 47
Measures of Variability
Definition 1.3: Sample Standard Deviation

1 - 49
1.4 Measures of Variability (Dispersion or Variation):
•The variation or dispersion in a set of data refers to how spread out
the observations are from each other.

- The variation is small when the observations are close together.


- There is no variation if the observations are the same.
•Some measures of dispersion are Range, Variance, and Standard Deviation

•These measures are designed to give some quantitative measures of


the variability in the data.
1 - 50
The Sample Variance (S2):
Let x1 , x2 ,, xn be the observations of the sample.
The sample variance is denoted by S2 and is defined by:
n
 ( xi − x)
2
i =1 ( x1 − x) 2 + ( x2 − x) 2 +  + ( xn − x) 2 (unit)2
S = 2
=
n −1 n −1
n
where x =  xi / n is the sample mean.
i =1

Note:
(n −1) is called the degrees of freedom (df) associated with the
sample variance S2.

1 - 51
The Standard Deviation (S):

The standard deviation is another measure of variation. It is the


square root of the variance, i.e., it is:

n
 ( xi − x)
2
i =1 (unit)
S= S = 2
n −1

1 - 52
Example:
Compute the sample variance and standard deviation of the
following observations (ages in year): 10, 21, 33, 53, 54.
Solution:
n=5
n 5
 xi  xi
i =1 10 + 21 + 33 + 53 + 54 171
i =1
x= = = = = 34.2 (year)
n 5 5 5
n 5
 ( xi − x)  ( xi − 34.2)
2 2

S 2 = i =1 = i =1
n −1 5 −1

=
(10 − 34.2 )2 + (21 − 34.2 )2 + (33 − 34.2 )2 + (53 − 34.2 )2 + (54 − 34.2 )2
4
1506.8
= = 376.7 (year) 2
4
1 - 53
The sample standard deviation is:

S = S 2 = 376.7 = 19.41 (year)

* Another Formula for Calculating S2:


n 2
 xi
2
− nx
i =1
S2 = (It is simple and more accurate)
n −1
For the previous Example,

xi 10 21 33 53 54  xi = 171

x 2
i
100 441 1089 2809 2916  x i2 = 7355

n
x
2
2
− nx
7355 − (5)(34.2)
i 2
1506.8
S2 = i =1
= = = 376.7 (year)2
n −1 5 −1 4 1 - 54
Discrete and Continuous Data
Statistical Modeling, Scientific,
Inspection, and Graphical
Diagnostics
Table 1.1 Data Set for Example1.2

1 - 57
Figure 1.1 A dot plot of stem weight data

1 - 58
Table 1.3 Tensile strength

1 - 59
Figure 1.5 Scatter plot of tensile strength
and cotton percentages

1 - 60
Table 1.4 Car Battery Life

1 - 61
Table 1.5 Stem-and-Leaf Plot of Battery
Life

1 - 62
Stem and Leaf Diagrams
35 23 18 25 20
16 22 27 33 41
27 37 17 25 27
29 28 31 32 40

1 8 6 7 1 6 7 8
2 3 5 0 2 7 7 7 9 8 2 0 2 3 5 7 7 7 8 9
3 5 3 7 1 2 3 1 2 3 5 7
4 1 0 4 0 1
1 8 means 18
1 - 63
Stem and Leaf Diagrams
5.2 6.6 4.3 8.3 5.1
7.5 8.6 7.1 7.8 2.2
6.6 5.8 3.5 7.5 6.1
3.8 2.5 2.7 8.8 4.8

1 - 64
Raw data

The following data were collected on the ages


of cyclists involved in road accidents

1 - 65
66 6 62 19 20 15 21 8 21 63 44 10 44

26 35 26 61 13 61 28 21 7 10 52 13 52

19 22 64 11 39 22 9 13 9 17 64 32 8

62 28 36 37 18 138 16 67 45 10 55 14 66

49 9 23 12 9 37 7 36 9 88 46 12 59

18 20 11 25 7 42 29 6 60 60 16 50 16

18 15 18 17 31 14 22 14 34 20 9 67 61

34

Total 92
1 - 66
Ages of cyclists in road accidents
Always include a title
0667778899999
100011223334445566677888899
20001112223566889
31244566779
4244569
502259
600111223446677 Always include a
7 Key
88
Key 6|7 means 67 years
1 - 67
Table 1.6 Double-Stem-and-Leaf Plot of
Battery Life

1 - 68
Table 1.7 Relative Frequency Distribution
of Battery Life

1 - 69
Figure 1.6 Relative frequency histogram

1 - 70
Figure 1.7 Estimating frequency
distribution

1 - 71
Table 1.8 Nicotine Data for Example 1.5

1 - 72
Figure 1.9 Box-and-whisker plot for
Example 1.5

1 - 73
Figure 1.10 Stem-and-Leaf plot for the
nicotine data

1 - 74
Table 1.9 Data for Example 1.6

1 - 75
Figure 1.11 Box-and-whisker plot for
thickness of paint can “ears”

1 - 76
Develop your own Stem and Leaf Plot with the
following temperatures for June.

77 80 82 68 65 59 61
57 50 62 61 70 69 64
67 70 62 65 65 73 76
87 80 82 83 79 79 71
80 77

1 - 77
Answer:

5 079
6 11224555789

7 001367799
8 0002237

1 - 78
Frequency
Frequency is how often something occurs.

Example: Sam played football on…


• Saturday morning
• Saturday afternoon
• Sunday afternoon

The frequency is:

1 - 79
Categorical Frequency Distribution

By counting frequencies, we can make a


frequency distribution table.

A categorical frequency distribution is used


for data that can be placed into specific
categories.

1 - 80
Creating a Categorical Frequency Distribution

Step 1:
Make a table with the following columns in
order: class, tally, and frequency

1 - 81
Step 2:
Tally the data and
place the results in the tally column.

1 - 82
Step 3:
Count the tallies and place the results in the
frequency column.

1 - 83
Example Categorical Frequency Distribution

These are the favorite colors of fifteen 2nd graders.


Red Blue Green
Yellow Red Yellow
Green Red Red
Red Green Blue
Blue Red Green

Class Tally Frequency

Total= 1 - 84
What about if the categories of
data are numbers?

1 - 85
Grouped Frequency Distribution

A frequency distribution with classes that are


more than one unit in width

When the range of the data is large, the data must be


grouped into classes.
41 104 112 118 87 95
105 57 107 67 78 125
109 99 105 99 101 92

1 - 86
Key Concept

1 - 87
Class Width

The class width is the range of the class.

This can be found by:


Subtracting the lower-class limit of one class
from the lower-class limit of the next class.

1 - 88
Rules For Grouped Data
Rule #1: Choose the classes
You will normally be told how many classes you need.

Rule #2: Choose Class Width


ALWAYS round up to the next whole number

Rule #3: Mutually Exclusive


This means the class limits cannot overlap or be contained in
more than one class.

1 - 89
Rules For Grouped Data
Rule #4: Continuous
Even if there are no values in a class, the class must be
included in the frequency distribution. There should be no
gaps in a frequency distribution.
(with the exception of a class with zero frequency)

Rule #5: Exhaustive


There should be enough classes to accommodate all of the
data.

Rule #6: Equal Width


This avoids a distorted view of the data.
1 - 90
Creating a Frequency Distribution

Step 1:
Determine the minimum and maximum
values, and how many classes you need

1 - 91
Step 2:
Find the class width

Class Width = __Range__


# of classes

*ALWAYS round up to the next whole number

1 - 92
Step 3:
Write your minimum value as your
lowest lower limit

Class Limits
Minimum value 2

1 - 93
Step 4: Add the class width to
your lower limit to find the next lower limit;
WRITE BELOW, NOT BESIDE!
(do all lower limits first)

Ex: Class width = 9


Class Limits
2
11
20
29
Go until you have the amount of classes needed (in this case 4)

1 - 94
Step 5:
To find each upper limit, subtract one
from the next lower limit
Class Limits
2 - 10
11 - 19
20 - 28
29

1 - 95
Step 6:
To find last upper limit, add class width
to the 2nd to last upper limit

Class Limits
2 - 10
11 - 19
20 - 28
29 - 37

1 - 96
Frequency Distributions
Minutes Spent on the Phone
102 124 108 86 103 82
71 104 112 118 87 95
103 116 85 122 87 100
105 97 107 67 78 125
109 99 105 99 101 92

Make a frequency distribution table with five classes.


Minimum value = 67
Maximum value = 125 1 - 97
Steps to Construct a Frequency Distribution
1. Choose the number of classes
For this problem use 5

2. Calculate the Class Width


Find the range = maximum value – minimum. Then divide this by the number of
classes. Finally, round up to the next whole number.
(125 - 67) / 5 = 11.6 Round up to 12

3. Determine All Class Limits


The lower class limit is the lowest data value that belongs in a class and the upper
class limit is the highest. Use the minimum value as the lower class limit in the first
class. (67)

4. Mark a tally | in appropriate class for each data value.


After all data values are tallied, count the tallies in each class for the class frequencies.

1 - 98
Construct a Frequency Distribution Table

Minimum = 67, Maximum = 125


Number of classes = 5
Class width = 12

Class Limits Tally f


67 78 3
79 90 5
91 102 8
103 114 9
115 126 5
Do all lower-class limits first. Total=30
1 - 99
Example: Try it!
After conducting a survey of 30 of your classmates,
you are left with the following set of data on how
many days off each employee has taken this year:

7, 8, 9, 4, 10, 36, 19, 9, 26, 5, 11, 6, 2, 9, 10,


8, 16, 29, 7, 9, 8, 25, 4, 27, 8, 7, 6, 10, 34, 8
Construct a Frequency Table. Assume you want to
divide the data into 5 different classes.

1 - 100
Answer
Class Limits Tally Frequency
2-8 14
9-15 8
16-22 2
23-29 4
30-36 2
Total: 30

1 - 101
Box and Whisker Plots
A box plot summarizes data using the
median, upper and lower quartiles, and the
extreme (least and greatest) values.
It allows you to see important
characteristics of the data at a glance.

1 - 102
The 5 Number Summary

• The five number summary is another name for the


visual representation of the box and whisker plot.

• The five number summary consist of :


– The median ( 2nd quartile)
– The 1st quartile
– The 3rd quartile
– The maximum value in a data set
– The minimum value in a data set
1 - 103
Box and Whisker Diagrams.

Anatomy of a Box and Whisker Diagram.

Lower Upper
Lowest Quartile Median Quartile Highest
Value Value
Whisker Box Whisker

4 5 6 7 8 9 10 11 12

1 - 104
Constructing a box and whisker plot

Step 1 - take the set of numbers given…


34, 18, 100, 27, 54, 52, 93, 59, 61, 87, 68, 85, 78, 82, 91

Place the numbers in order from least to greatest:


18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100

1 - 105
Constructing a box and whisker plot

• Step 2 - Find the median.


• Remember, the median is the middle value in a data set.

18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100

68 is the median of this data set.

1 - 106
Constructing a box and whisker plot

• Step 3 – Find the lower quartile.


• The lower quartile is the median of the data set to the left of 68.

(18, 27, 34, 52, 54, 59, 61,) 68, 78, 82, 85, 87, 91, 93, 100

52 is the lower quartile

1 - 107
Constructing a box and whisker plot

• Step 4 – Find the upper quartile.


• The upper quartile is the median of the data set to the right of 68.

18, 27, 34, 52, 54, 59, 61, 68, (78, 82, 85, 87, 91, 93, 100)

87 is the upper quartile

1 - 108
Constructing a box and whisker plot

• Step 5 – Find the maximum and minimum values in the set.


• The maximum is the greatest value in the data set.
• The minimum is the least value in the data set.
18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100

18 is the minimum and 100 is the maximum.

1 - 109
Constructing a box and whisker plot

• Step 5 – Find the inter-quartile range (IQR).


• The inter-quartile (IQR) range is the difference between the
upper and lower quartiles.
➢Upper Quartile = 87
➢Lower Quartile = 52
➢87 – 52 = 35
➢35 = IQR

1 - 110
The 5 Number Summary

• Organize the 5 number summary


– Median – 68
– Lower Quartile – 52
– Upper Quartile – 87
– Max – 100
– Min – 18

1 - 111
Even Numbered Data Sets
If the data set has an even number of pieces of data, we
find the mean of the two middle numbers to find the
median of the set
2, 4, 5, 6, 7, 8, 9, 11, 19, 20
7 + 8 = 15
15 divided by 2 = 7.5

The median is 7.5

1 - 112
Even Numbered Data Sets

• The median splits the data set in half.


[ 2, 4, 5, 6, 7] 7.5 [8, 9, 11, 19, 20]

• From here we can then find the upper and


lower quartiles as well as the upper and lower
extremes.

1 - 113
Lower Quartile

• The lower quartile is the median of the


bottom half of the data (to the left of the
median).
[ 2, 4, 5, 6, 7] 7.5 [8, 9, 11, 19, 20]

Lower Quartile for this data = 5

1 - 114
Upper Quartile

• The upper quartile is the median of the top


half of the data (to the right of the median).
[ 2, 4, 5, 6, 7] 7.5 [8, 9, 11, 19, 20]

The upper quartile for this data set = 11

1 - 115
Interquartile Range
– To find the interquartile range, subtract the lower
quartile from the upper quartile.
Upper Quartile – Lower Quartile = _____

[ 2, 4, 5, 6, 7] 7.5 [8, 9, 11, 19, 20]


11 – 5 =6
The interquartile range for this data = 6

1 - 116
Lower Extreme

– The lower extreme is the lowest number in the


data set.
[ 2, 4, 5, 6, 7] 7.5 [8, 9, 11, 19, 20]

The lower extreme for this data set = 2

1 - 117
Upper Extreme

– The upper extreme is the highest number in the


data set.
[ 2, 4, 5, 6, 7] 7.5 [8, 9, 11, 19, 20]

The upper extreme for this data set = 20

1 - 118
Range

• The range of the data can be found by


subtracting the lower extreme from the upper
extreme.
[ 2, 4, 5, 6, 7] 7.5 [8, 9, 11, 19, 20]

20 – 2 = 18
The range for this data set = 18

1 - 119
Even Numbered Data Sets

[ 2, 4, 5, 6, 7] 7.5 [8, 9, 11, 19, 20]

– Median = 7.5
– Lower Quartile = 5
– Upper Quartile = 11
– Upper Extreme = 20
– Lower Extreme = 2

1 - 120
Graphing The Data
• Notice, the Box includes the lower quartile,
median, and upper quartile.
• The Whiskers extend from the Box to the max
and min.

1 - 121
Interpreting the Box Plot:
Study your Box and Whisker Plot to determine what
it is telling you.

Make a statement about what it is saying, then

support the statement with facts from your graph.

1 - 122
You should include the following in
your interpretation:
• Range or spread of the data and what it means to your graph
• Quartiles- compare them.
What are they telling you about the data?
• Median- this is an important part of the graph and should
be an important part of the interpretation.
• Percentages should be used to interpret the data, where
relevant.

1 - 123
Analyzing The Graph
• The data values found inside the box represent the middle
half ( 50%) of the data.
• The line segment inside the box represents the median

1 - 124
Practice

• Use the following set of data to create the 5 number


summary.

3, 7, 11, 11, 15, 21, 23, 39, 41, 45, 50, 61, 87, 99, 220

1 - 125
Median

• What is the median or 2nd quartile?

3, 7, 11, 11, 15, 21, 23, 39, 41, 45, 50, 61, 87, 99, 220

• The median is 39

1 - 126
Lower Quartile ( 1st Quartile )

• What is the lower or 1st quartile?

(3, 7, 11, 11, 15, 21, 23), 39, 41, 45, 50, 61, 87, 99, 220

• The lower quartile is 11

1 - 127
Upper Quartile ( 3rd Quartile )

• What is the upper or 3rd quartile?

3, 7, 11, 11, 15, 21, 23, 39, (41, 45, 50, 61, 87, 99, 220)

• The upper quartile is 61

1 - 128
Maximum

• What is the maximum?

3, 7, 11, 11, 15, 21, 23, 39, 41, 45, 50, 61, 87, 99, 220

• The max is 220

1 - 129
Minimum

• What is the minimum?

3, 7, 11, 11, 15, 21, 23, 39, 41, 45, 50, 61, 87, 99, 220

• The min is 3

1 - 130
The 5 Number Summary

• Median - 39
• Lower Quartile - 11
• Upper Quartile - 61
• Max - 220
• Min - 3

1 - 131
Graphing The Data

Take out your graph paper so we can practice


graphing the data.

1 - 132
Discuss the calculations below.

Battery Life: The life of 12 batteries recorded in hours is:


2, 5, 6, 6, 7, 8, 8, 8, 9, 9, 10, 15
Mean = 93/12 = 7.75 hours and the range = 15 – 2 = 13 hours.

2, 5, 6, 6, 7, 8, 8, 8, 9, 9, 10, 15
Median = 8 hours and the inter-quartile range = 9 – 6 = 3 hours.

The averages are similar but the measures of spread are


significantly different since the extreme values of 2 and 15
are not included in the inter-quartile range.

1 - 133
Box and Whisker Diagrams.

Box plots are useful for comparing two or more sets of data like that shown below for
heights of boys and girls in a class.
Anatomy of a Box and Whisker Diagram.
Lowest Lower Upper Highest
Value Quartile Median Quartile Value
Whisker Whisker
Box

4 5 6 7 8 9 10 11 12

Boys

130 140 150 160 170 180 cm 190


Girls

1 - 134
Drawing a Box Plot.

Example 1: Draw a Box plot for the data below

Q1 Q2 Q3

4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12

Lower Upper
Median
Quartile Quartile
= 8
= 5½ = 9

4 5 6 7 8 9 10 11 12
1 - 135
Drawing a Box Plot.

Example 2: Draw a Box plot for the data below

Q1 Q2 Q3
3, 4, 4, 6, 8, 8, 8, 9, 10, 10, 15,

Lower Upper
Quartile Median Quartile
= 4 = 8 = 10

3 4 5 6 7 8 9 10 11 12 13 14 15

1 - 136
Drawing a Box Plot.

Question: Stuart recorded the heights in cm of boys in


his class as shown below. Draw a box plot for this data.
QL Q2 Qu

137, 148, 155, 158, 165, 166, 166, 171, 171, 173, 175, 180, 184, 186, 186

Lower Upper
Quartile Median Quartile
= 158 = 171 = 180

130 140 150 160 170 180 cm 190

1 - 137
Quartiles, Deciles and Percentiles

• The median splits the data into equal sized halves

• Quartiles split the data into quarters


• Deciles into tenths
• And percentiles can be any split of our choosing

• These measures include quartiles, deciles, a

1 - 138
50% - -→ 50%

Lowest Data Median 50% value Highest Data


Value Value

25% 25% 25% 25%

Q1 Q2 Q3
Quartiles

10% 10% 10% 10% 10% 10% 10% 10% 10% 10%

Deciles 1/10
1 - 139
Percentile Computation
• To formalize the computational procedure, let Lp refer to the
location of a desired percentile. So if we wanted to find the
33rd percentile we would use L33 and if we wanted the median,
the 50th percentile, then L50.

• The number of observations is n, so if we want to locate the


median, its position is at (n + 1)/2, or we could write this as
(n + 1)(P/100), where P is the desired percentile.

1 - 140
Percentiles - Example

Listed below are the commissions earned last month by a


sample of 15 brokers at Salomon Smith Barney’s Oakland,
California, office.

$2,038 $1,758 $1,721 $1,637


$2,097 $2,047 $2,205 $1,787
$2,287 $1,940 $2,311 $2,054
$2,406 $1,471 $1,460

Locate the median, the first quartile, and the third quartile
for the commissions earned.

1 - 141
Percentiles – Example (cont.)

Step 1: Organize the data from lowest to


largest value

$1,460 $1,471 $1,637 $1,721


$1,758 $1,787 $1,940 $2,038
$2,047 $2,054 $2,097 $2,205
$2,287 $2,311 $2,406

1 - 142
Percentiles – Example (cont.)

Step 2: Compute the first and third quartiles.


Locate L25 and L75 using:

25 75
L25 = (15 + 1) =4 L75 = (15 + 1) = 12
100 100
Therefore, the first and third quartiles are located at the 4th and 12th
positions, respectively
L25 = $1,721
L75 = $2,205

1 - 143
Boxplots

A box plot is a graphical display, based on quartiles, that


helps us picture a set of data.

To construct a box plot, we need only five statistics:

1. the minimum value,


2. Q1(the first quartile),
3. the median,
4. Q3 (the third quartile), and
5. the maximum value.

1 - 144
Boxplot - Example

1 - 145
Boxplot Example
Step1: Create an appropriate scale along the horizontal axis.
Step 2: Draw a box that starts at Q1 (15 minutes) and
ends at Q3 (22 minutes). Inside the box we place a vertical line to represent
the median (18 minutes).
Step 3: Extend horizontal lines from the box out to
the minimum value (13 minutes) and the maximum value (30 minutes).

1 - 146
Example: Draw a Box & Whisker for

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

(n = 9)
Q1 (L25) is in the (9+1)*25/100 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,

so Q1 = 12.5

Q1 and Q3 are measures of non-central location


Q2 = median, is a measure of central tendency
1 - 147
Quartile Measures
Calculating The Quartiles: Example
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

(n = 9)
Q1 is in the (9+1)*25/100 = 2.5 position of the ranked data,
so Q1 = (12+13)/2 = 12.5

Q2 is in the (9+1)*50/100 = 5th position of the ranked data,


so Q2 = median = 16

Q3 is in the (9+1)*75/100 = 7.5 position of the ranked data,


so Q3 = (18+21)/2 = 19.5
Q1 and Q3 are measures of non-central location
Q2 = median, is a measure of central tendency
1 - 148
Quartile Measures- Calculation Rules
• When calculating the ranked position, use the following rules:

- If the result is a whole number, then

it is the ranked position to use.

- If the result is a fractional half (e.g., 2.5, 7.5, 8.5, etc.), then
average the two corresponding data values.

- If the result is NOT a whole number or a fractional half, then

interpolate between the data points.

1 - 149
Quartile Measures:
The Interquartile Range (IQR)
― The IQR is Q3 – Q1 and measures the spread in the middle
50% of the data

― The IQR is a measure of variability that is not influenced by


outliers or extreme values

― Measures like Q1, Q3, and IQR that are not influenced by
outliers are called resistant measures

1 - 150
The Interquartile Range

Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%

11 12.5 16 19.5 22

Interquartile range
= 19.5 – 12.5 = 7

1 - 151
Interpolation

If you found that the first quartile was the


13.75th value, then you interpolate like this:
Take the 13th and 14th data values
Find the difference |14th-15th|
Multiply the difference by 0.75
Add the calculated value to the 13th value

1 - 152
Figure 1.8 Skewness of data

1 - 153
Distribution Shape and
The Boxplot
Negatively-Skewed Symmetrical Positively-Skewed

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q 2 Q3

1 - 154
General Types of Statistical Studies:
Designed Experiment, Observational
Study, and Retrospective Study
Basic Types of Studies

Three basic methods for collecting data:


– A retrospective study using historical data
• Data collected in the past for other purposes.
– An observational study
• Data, presently collected, by a passive observer.
– A designed experiment
• Data collected in response to process input changes.

1 - 156

You might also like