Chapter One - Introduction
Chapter One - Introduction
Textbook:
Applied Statistics and Probability for Engineers, by D. Montgomery
and G. Runger, 6th edition, Wiley.
Email: Drjreissat@gmail.com
Office: IE
Chapter 1
An engineer
is someone who solves problems of interest
to society with the efficient application of
scientific principles by:
• Refining existing products
• Designing new products or processes
1-3
The engineering method
1-4
Probability and Statistics in
Engineering
1-5
What is Probability?
• Probability is
the measure of the likeliness that a random event will occur, or the
knowledge upon an underlying model in figuring out the chance that
different outcomes will occur.
• By definition, probability values are between 0 and 1.
• If we flip a fair coin 3 times,
what is the probability of obtaining 3 heads?
• If we throw a dice 2 times,
what is the probability that the sum of the faces is 10?
1-6
What is Statistics?
◼ Statistics is a tool to get information from data.
Data
Statistics Information
Probability
• Knowledge about the
• Facts (mostly numerical), population concerning
collected from a certain some particular facts
population
◼ Statistics is used because the underlying model that governs a
certain experiments is not known.
◼ All that available is a sample of some outcomes of the experiment.
◼ The sample is used to make inference about the probability model
that governs the experiment.
◼ So, a thorough understanding of probability is essential to
understand statistics.
1-7
Why do we need to use
Probability and statistics ?!
1-8
Experiments & Processes Are Not
Deterministic
• Statistical techniques are useful for describing and
understanding variability.
• By variability, we mean successive observations of a system or
phenomenon do not produce exactly the same result.
• Statistics gives us a framework for describing this variability
and for learning about potential sources of variability.
1-9
Models Can Also Reflect Uncertainty
1 - 10
STATISTICS
90
80
70
60
50 A
40 B
C
30
20
10
0
Quality control:
1 2 3 4
• What is my rate of
Forecasting: defective products?
• How can I best manage
• Expectations for the Analysis of sales: my production?
future?
• How much do we sell, and • What is the best way to
• How will the stock sample?
when?
markets behave?? • Should we change or
sales strategy?
1 - 11
Statistics in Engineering
1 - 12
Statistics: Basic Ideas
• Statistics is
the area of science that deals with collection, organization, analysis, and
interpretation of data to:
• Make decisions
• Solve problems e.g. Identify sources of variability
• Design products and processes
• It is the science of learning information from data.
• It also deals with methods and techniques that can be used to draw
conclusions about the characteristics of a large number of data points--
commonly called a population.
• By using a smaller subset of the entire data.
1 - 13
Probability: Basic Ideas
• Terminology:
– Trial
Each time you repeat an experiment
– Outcome
Result of an experiment
– Random experiment
One with random outcomes (cannot be predicted
exactly)
– Relative frequency
How many times a specific outcome occurs within
the entire experiment. 1 - 14
For Example…
• You work in a cell phone factory and are asked to
remove cell phones at random off of the assembly line
and turn it on and off.
- Each time you remove a cell phone and turn it on and off,
you are conducting a Random Experiment
- If you check 200 phones, and you find 5 bad phones, then
Relative Frequency of failure = 5/200 = 0.025
1 - 15
Overview: Statistical Inference,
Samples, Populations, and the
Role of Probability
Two Directions of Reasoning
Example: Example:
JU Students (Height Mean) 20 Students from JU (Sample
N=Population Size Mean)
n = Sample Size
1 - 18
• Let X1,X2,…,XN be the population values (in general, they are
unknown)
• Let x1,x2,…,xn be the sample values (these values are known)
• Statistics obtained from the sample are used to estimate
(approximate) the parameters of the population.
✓ Scientific Data
✓ Statistical Inference
(1) Estimation:
➢ Point Estimation
➢ Interval Estimation (Confidence Interval)
(2) Hypotheses Testing
1 - 19
Some Terminologies
◼ Data: result of observation that consists of information, in the form
of counts, measurements, or responses.
◼ Parameter: numerical description of a population characteristics.
◼ Statistic: numerical description of a sample characteristics.
◼ Population: the collection of all outcomes, counts, measurements,
or responses that are of interest.
◼ Sample: a subset of a population.
◼ Experiment: any process that generates a set of data.
◼ Sample space: the set of all possible outcomes of a statistical
experiment. It is represented by the symbol S.
◼ Element or member: each outcome in a sample space. Sometimes
simply called a sample point.
1 - 20
Hypothesis Tests
Hypothesis Test
1 - 21
Example
A quality control engineer at an integrated circuit
manufacturing plant takes a sample of 100 RAM chips
from the assembly line and finds that 10 are defective.
The company can tolerate 5% defective production in
the long run. The quality control engineer has to
determine whether the long-run defective rate is within
the tolerable range.
• Population :
All possible Ram chips coming out of the manufacturing process.
• Sample :
The RAM chips taken from the assembly line. In this case, the sample
size is 100.
1 - 22
• Using probability, the engineer computes that the chance of obtaining
10 out of 100 defective chips is 0.0167 if the long–term defective rate is
5%. Similarly, he can compute that the chance of obtaining 10 or more
defective chips is 0.0282.
1 - 23
Fundamental relationship between probability and
inferential statistics
✓ For a statistical problem, the
sample along with inferential
statistics allow us to draw
conclusions about population.
✓ Problems in probability allow us
to draw conclusions about
characteristics of hypothetical
data taken from the population
based on known features of the
populations. 1 - 24
Sampling Procedures;
Collection of Data
Data Collection
• Observational study
– Historical data
• Design of experiment
1 - 26
Statistics
• Divided into :
– Descriptive Statistics
– Inferential Statistics
1 - 27
Branches of Statistics
◼ Descriptive statistics
is the branch of statistics that involves the organization, summarization, and display of
data when the population can be enumerated completely.
◼ Inferential statistics
is the branch of statistics that involves using a sample of a population to draw
conclusions about the whole population.
A basic tool in the study of inferential statistics is probability.
◼ Descriptive statistics:
There are 45 students in the Probability and Statistics class. Twenty are younger than
24 years old. 16 are older than 36 years old.
What can be concluded?
◼ Inferential statistics:
As many as 860 people in Amman were questioned. People who drive bicycles daily
have an average age of 31 years old. For people who drive a motorcycle, the average
age is 21.
What can be concluded? 1 - 28
Steps in Inferential Statistics
◼ Design the experiments and collect the data.
1 - 29
Forms of Data Description
• Point summary
• Tabular format
• Graphical format
• Diagrams
1 - 30
Point Summary
1 - 31
Point Summary
• Variability measures
– Range = Max xi - Min xi
– Variance = V = (xi – x )2/ n-1
– Standard deviation = S
S = Square root (V)
– Coefficient of variation = S/ x
1 - 32
Dot Diagram
. .
.
.
0 50 100
1 - 33
Dot Diagram
. .
.
.
0 50 100
1 - 34
Time Frequency Plot
Time Frequency Plot
15
14
13
12
11
y 10
9
8
7
6
5
0 10 20 30 40 50
Observation number
1 - 35
Time Frequency Plot
Time Frequency Plot
15
14
13
12
11
y 10
9
8
7
6
5
0 10 20 30 40 50
Observation number
1 - 36
Control
Control Charts
Chart
105
Upper control limit = 100.5
Concentration
95
x = 91.50
85
75
0 10 20 30
Observation number
1 - 37
Table: Data Example (A)
1 - 38
Figure: Corrosion results for Example (A)
1 - 39
Measures of Location:
1 - 41
Definition 1.2 : Sample Median
1 - 42
Figure: Sample mean as a centroid of the
with-nitrogen stem weight
1 - 43
1.3 Measures of Location (Central Tendency):
1 - 44
Example:
Suppose that the following sample represents the ages (in year)
of a sample of 3 men:
x 1 = 30, x 2 = 35, x 3 = 27 . (n = 3)
30 + 35 + 27 92
x= = = 30.67
3 3
n
Note: ( xi − x) = 0
i =1
1 - 45
Other Measures of Locations
• Trimmed Mean
✓ A trimmed mean is computed by “trimming away”
✓ a certain percent of both the largest and the smallest
set of values.
✓ For example, the 10% trimmed mean is found by
✓ eliminating the largest 10% and smallest 10%
✓ and computing the average of the remaining values.
1 - 46
For example, in the case of the stem weight data,
we would eliminate the largest and smallest since
the sample size is 10 for each sample.
➢ For the without-nitrogen group the 10% trimmed mean is given by
1 - 47
Measures of Variability
Definition 1.3: Sample Standard Deviation
1 - 49
1.4 Measures of Variability (Dispersion or Variation):
•The variation or dispersion in a set of data refers to how spread out
the observations are from each other.
Note:
(n −1) is called the degrees of freedom (df) associated with the
sample variance S2.
1 - 51
The Standard Deviation (S):
n
( xi − x)
2
i =1 (unit)
S= S = 2
n −1
1 - 52
Example:
Compute the sample variance and standard deviation of the
following observations (ages in year): 10, 21, 33, 53, 54.
Solution:
n=5
n 5
xi xi
i =1 10 + 21 + 33 + 53 + 54 171
i =1
x= = = = = 34.2 (year)
n 5 5 5
n 5
( xi − x) ( xi − 34.2)
2 2
S 2 = i =1 = i =1
n −1 5 −1
=
(10 − 34.2 )2 + (21 − 34.2 )2 + (33 − 34.2 )2 + (53 − 34.2 )2 + (54 − 34.2 )2
4
1506.8
= = 376.7 (year) 2
4
1 - 53
The sample standard deviation is:
xi 10 21 33 53 54 xi = 171
x 2
i
100 441 1089 2809 2916 x i2 = 7355
n
x
2
2
− nx
7355 − (5)(34.2)
i 2
1506.8
S2 = i =1
= = = 376.7 (year)2
n −1 5 −1 4 1 - 54
Discrete and Continuous Data
Statistical Modeling, Scientific,
Inspection, and Graphical
Diagnostics
Table 1.1 Data Set for Example1.2
1 - 57
Figure 1.1 A dot plot of stem weight data
1 - 58
Table 1.3 Tensile strength
1 - 59
Figure 1.5 Scatter plot of tensile strength
and cotton percentages
1 - 60
Table 1.4 Car Battery Life
1 - 61
Table 1.5 Stem-and-Leaf Plot of Battery
Life
1 - 62
Stem and Leaf Diagrams
35 23 18 25 20
16 22 27 33 41
27 37 17 25 27
29 28 31 32 40
1 8 6 7 1 6 7 8
2 3 5 0 2 7 7 7 9 8 2 0 2 3 5 7 7 7 8 9
3 5 3 7 1 2 3 1 2 3 5 7
4 1 0 4 0 1
1 8 means 18
1 - 63
Stem and Leaf Diagrams
5.2 6.6 4.3 8.3 5.1
7.5 8.6 7.1 7.8 2.2
6.6 5.8 3.5 7.5 6.1
3.8 2.5 2.7 8.8 4.8
1 - 64
Raw data
1 - 65
66 6 62 19 20 15 21 8 21 63 44 10 44
26 35 26 61 13 61 28 21 7 10 52 13 52
19 22 64 11 39 22 9 13 9 17 64 32 8
62 28 36 37 18 138 16 67 45 10 55 14 66
49 9 23 12 9 37 7 36 9 88 46 12 59
18 20 11 25 7 42 29 6 60 60 16 50 16
18 15 18 17 31 14 22 14 34 20 9 67 61
34
Total 92
1 - 66
Ages of cyclists in road accidents
Always include a title
0667778899999
100011223334445566677888899
20001112223566889
31244566779
4244569
502259
600111223446677 Always include a
7 Key
88
Key 6|7 means 67 years
1 - 67
Table 1.6 Double-Stem-and-Leaf Plot of
Battery Life
1 - 68
Table 1.7 Relative Frequency Distribution
of Battery Life
1 - 69
Figure 1.6 Relative frequency histogram
1 - 70
Figure 1.7 Estimating frequency
distribution
1 - 71
Table 1.8 Nicotine Data for Example 1.5
1 - 72
Figure 1.9 Box-and-whisker plot for
Example 1.5
1 - 73
Figure 1.10 Stem-and-Leaf plot for the
nicotine data
1 - 74
Table 1.9 Data for Example 1.6
1 - 75
Figure 1.11 Box-and-whisker plot for
thickness of paint can “ears”
1 - 76
Develop your own Stem and Leaf Plot with the
following temperatures for June.
77 80 82 68 65 59 61
57 50 62 61 70 69 64
67 70 62 65 65 73 76
87 80 82 83 79 79 71
80 77
1 - 77
Answer:
5 079
6 11224555789
7 001367799
8 0002237
1 - 78
Frequency
Frequency is how often something occurs.
1 - 79
Categorical Frequency Distribution
1 - 80
Creating a Categorical Frequency Distribution
Step 1:
Make a table with the following columns in
order: class, tally, and frequency
1 - 81
Step 2:
Tally the data and
place the results in the tally column.
1 - 82
Step 3:
Count the tallies and place the results in the
frequency column.
1 - 83
Example Categorical Frequency Distribution
Total= 1 - 84
What about if the categories of
data are numbers?
1 - 85
Grouped Frequency Distribution
1 - 86
Key Concept
1 - 87
Class Width
1 - 88
Rules For Grouped Data
Rule #1: Choose the classes
You will normally be told how many classes you need.
1 - 89
Rules For Grouped Data
Rule #4: Continuous
Even if there are no values in a class, the class must be
included in the frequency distribution. There should be no
gaps in a frequency distribution.
(with the exception of a class with zero frequency)
Step 1:
Determine the minimum and maximum
values, and how many classes you need
1 - 91
Step 2:
Find the class width
1 - 92
Step 3:
Write your minimum value as your
lowest lower limit
Class Limits
Minimum value 2
1 - 93
Step 4: Add the class width to
your lower limit to find the next lower limit;
WRITE BELOW, NOT BESIDE!
(do all lower limits first)
1 - 94
Step 5:
To find each upper limit, subtract one
from the next lower limit
Class Limits
2 - 10
11 - 19
20 - 28
29
1 - 95
Step 6:
To find last upper limit, add class width
to the 2nd to last upper limit
Class Limits
2 - 10
11 - 19
20 - 28
29 - 37
1 - 96
Frequency Distributions
Minutes Spent on the Phone
102 124 108 86 103 82
71 104 112 118 87 95
103 116 85 122 87 100
105 97 107 67 78 125
109 99 105 99 101 92
1 - 98
Construct a Frequency Distribution Table
1 - 100
Answer
Class Limits Tally Frequency
2-8 14
9-15 8
16-22 2
23-29 4
30-36 2
Total: 30
1 - 101
Box and Whisker Plots
A box plot summarizes data using the
median, upper and lower quartiles, and the
extreme (least and greatest) values.
It allows you to see important
characteristics of the data at a glance.
1 - 102
The 5 Number Summary
Lower Upper
Lowest Quartile Median Quartile Highest
Value Value
Whisker Box Whisker
4 5 6 7 8 9 10 11 12
1 - 104
Constructing a box and whisker plot
1 - 105
Constructing a box and whisker plot
18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100
1 - 106
Constructing a box and whisker plot
(18, 27, 34, 52, 54, 59, 61,) 68, 78, 82, 85, 87, 91, 93, 100
1 - 107
Constructing a box and whisker plot
18, 27, 34, 52, 54, 59, 61, 68, (78, 82, 85, 87, 91, 93, 100)
1 - 108
Constructing a box and whisker plot
1 - 109
Constructing a box and whisker plot
1 - 110
The 5 Number Summary
1 - 111
Even Numbered Data Sets
If the data set has an even number of pieces of data, we
find the mean of the two middle numbers to find the
median of the set
2, 4, 5, 6, 7, 8, 9, 11, 19, 20
7 + 8 = 15
15 divided by 2 = 7.5
1 - 112
Even Numbered Data Sets
1 - 113
Lower Quartile
1 - 114
Upper Quartile
1 - 115
Interquartile Range
– To find the interquartile range, subtract the lower
quartile from the upper quartile.
Upper Quartile – Lower Quartile = _____
1 - 116
Lower Extreme
1 - 117
Upper Extreme
1 - 118
Range
20 – 2 = 18
The range for this data set = 18
1 - 119
Even Numbered Data Sets
– Median = 7.5
– Lower Quartile = 5
– Upper Quartile = 11
– Upper Extreme = 20
– Lower Extreme = 2
1 - 120
Graphing The Data
• Notice, the Box includes the lower quartile,
median, and upper quartile.
• The Whiskers extend from the Box to the max
and min.
1 - 121
Interpreting the Box Plot:
Study your Box and Whisker Plot to determine what
it is telling you.
1 - 122
You should include the following in
your interpretation:
• Range or spread of the data and what it means to your graph
• Quartiles- compare them.
What are they telling you about the data?
• Median- this is an important part of the graph and should
be an important part of the interpretation.
• Percentages should be used to interpret the data, where
relevant.
1 - 123
Analyzing The Graph
• The data values found inside the box represent the middle
half ( 50%) of the data.
• The line segment inside the box represents the median
1 - 124
Practice
3, 7, 11, 11, 15, 21, 23, 39, 41, 45, 50, 61, 87, 99, 220
1 - 125
Median
3, 7, 11, 11, 15, 21, 23, 39, 41, 45, 50, 61, 87, 99, 220
• The median is 39
1 - 126
Lower Quartile ( 1st Quartile )
(3, 7, 11, 11, 15, 21, 23), 39, 41, 45, 50, 61, 87, 99, 220
1 - 127
Upper Quartile ( 3rd Quartile )
3, 7, 11, 11, 15, 21, 23, 39, (41, 45, 50, 61, 87, 99, 220)
1 - 128
Maximum
3, 7, 11, 11, 15, 21, 23, 39, 41, 45, 50, 61, 87, 99, 220
1 - 129
Minimum
3, 7, 11, 11, 15, 21, 23, 39, 41, 45, 50, 61, 87, 99, 220
• The min is 3
1 - 130
The 5 Number Summary
• Median - 39
• Lower Quartile - 11
• Upper Quartile - 61
• Max - 220
• Min - 3
1 - 131
Graphing The Data
1 - 132
Discuss the calculations below.
2, 5, 6, 6, 7, 8, 8, 8, 9, 9, 10, 15
Median = 8 hours and the inter-quartile range = 9 – 6 = 3 hours.
1 - 133
Box and Whisker Diagrams.
Box plots are useful for comparing two or more sets of data like that shown below for
heights of boys and girls in a class.
Anatomy of a Box and Whisker Diagram.
Lowest Lower Upper Highest
Value Quartile Median Quartile Value
Whisker Whisker
Box
4 5 6 7 8 9 10 11 12
Boys
1 - 134
Drawing a Box Plot.
Q1 Q2 Q3
4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12
Lower Upper
Median
Quartile Quartile
= 8
= 5½ = 9
4 5 6 7 8 9 10 11 12
1 - 135
Drawing a Box Plot.
Q1 Q2 Q3
3, 4, 4, 6, 8, 8, 8, 9, 10, 10, 15,
Lower Upper
Quartile Median Quartile
= 4 = 8 = 10
3 4 5 6 7 8 9 10 11 12 13 14 15
1 - 136
Drawing a Box Plot.
137, 148, 155, 158, 165, 166, 166, 171, 171, 173, 175, 180, 184, 186, 186
Lower Upper
Quartile Median Quartile
= 158 = 171 = 180
1 - 137
Quartiles, Deciles and Percentiles
1 - 138
50% - -→ 50%
Q1 Q2 Q3
Quartiles
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
Deciles 1/10
1 - 139
Percentile Computation
• To formalize the computational procedure, let Lp refer to the
location of a desired percentile. So if we wanted to find the
33rd percentile we would use L33 and if we wanted the median,
the 50th percentile, then L50.
1 - 140
Percentiles - Example
Locate the median, the first quartile, and the third quartile
for the commissions earned.
1 - 141
Percentiles – Example (cont.)
1 - 142
Percentiles – Example (cont.)
25 75
L25 = (15 + 1) =4 L75 = (15 + 1) = 12
100 100
Therefore, the first and third quartiles are located at the 4th and 12th
positions, respectively
L25 = $1,721
L75 = $2,205
1 - 143
Boxplots
1 - 144
Boxplot - Example
1 - 145
Boxplot Example
Step1: Create an appropriate scale along the horizontal axis.
Step 2: Draw a box that starts at Q1 (15 minutes) and
ends at Q3 (22 minutes). Inside the box we place a vertical line to represent
the median (18 minutes).
Step 3: Extend horizontal lines from the box out to
the minimum value (13 minutes) and the maximum value (30 minutes).
1 - 146
Example: Draw a Box & Whisker for
(n = 9)
Q1 (L25) is in the (9+1)*25/100 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
(n = 9)
Q1 is in the (9+1)*25/100 = 2.5 position of the ranked data,
so Q1 = (12+13)/2 = 12.5
- If the result is a fractional half (e.g., 2.5, 7.5, 8.5, etc.), then
average the two corresponding data values.
1 - 149
Quartile Measures:
The Interquartile Range (IQR)
― The IQR is Q3 – Q1 and measures the spread in the middle
50% of the data
― Measures like Q1, Q3, and IQR that are not influenced by
outliers are called resistant measures
1 - 150
The Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
11 12.5 16 19.5 22
Interquartile range
= 19.5 – 12.5 = 7
1 - 151
Interpolation
1 - 152
Figure 1.8 Skewness of data
1 - 153
Distribution Shape and
The Boxplot
Negatively-Skewed Symmetrical Positively-Skewed
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q 2 Q3
1 - 154
General Types of Statistical Studies:
Designed Experiment, Observational
Study, and Retrospective Study
Basic Types of Studies
1 - 156