IE 211 - Chapter 1
IE 211 - Chapter 1
IE 211
Statistical Analysis for Industrial
Engineering 1
Espadera, Andre Paul V.
College of Engineering
Chapter 1
Review of the Basic
Concepts of
Probability and
Statistics
2
Definition of Statistical Concepts of
Probability and Statistics
3
Variables
• A variable is a characteristic or condition that can change
or take on different values.
• Most research begins with a general question about the
relationship between two variables for a specific group of
individuals.
4 4
Population
• The entire group of individuals is called the population.
• For example, a researcher may be interested in the
relation between class size (variable 1) and academic
performance (variable 2) for the population of third-grade
children.
5 5
Sample
• Usually populations are so large that a researcher cannot
examine the entire group. Therefore, a sample is
selected to represent the population in a research study.
The goal is to use the results obtained from the sample to
help answer questions about the population.
6 6
7
Types of Variables
• Variables can be classified as discrete or continuous.
• Discrete variables (such as class size) consist of
indivisible categories, and continuous variables (such
as time or weight) are infinitely divisible into whatever
units a researcher may choose. For example, time can
be measured to the nearest minute, second, half-second,
etc.
8 8
Real Limits
• To define the units for a continuous variable, a researcher
must use real limits which are boundaries located exactly
half-way between adjacent categories.
9 9
Measuring Variables
10 10
4 Types of Measurement Scales
1. A nominal scale is an unordered set of categories
identified only by name. Nominal measurements only
permit you to determine whether two individuals are the
same or different.
2. An ordinal scale is an ordered set of categories.
Ordinal measurements tell you the direction of difference
between two individuals.
11 11
4 Types of Measurement Scales
12 12
Correlational Studies
• The goal of a correlational study is to determine whether
there is a relationship between two variables and to
describe the relationship.
• A correlational study simply observes the two variables
as they exist naturally.
13 13
14
Experiments
15 15
Experiments (cont.)
• In an experiment, one variable is manipulated to create treatment
conditions. A second variable is observed and measured to
obtain scores for a group of individuals in each of the treatment
conditions. The measurements are then compared to see if there
are differences between treatment conditions. All other variables
are controlled to prevent them from influencing the results.
• In an experiment, the manipulated variable is called the
independent variable and the observed variable is the
dependent variable.
16 16
17
Other Types of Studies
• Other types of research studies, know as non-
experimental or quasi-experimental, are similar to
experiments because they also compare groups of
scores.
• These studies do not use a manipulated variable to
differentiate the groups. Instead, the variable that
differentiates the groups is usually a pre-existing
participant variable (such as male/female) or a time
variable (such as before/after).
18 18
Other Types of Studies (cont.)
• Because these studies do not use the manipulation and
control of true experiments, they cannot demonstrate
cause and effect relationships. As a result, they are
similar to correlational research because they simply
demonstrate and describe relationships.
19 19
20
Data
• The measurements obtained in a research study are
called the data.
• The goal of statistics is to help researchers organize and
interpret the data.
21 21
Descriptive Statistics
• Descriptive statistics are methods for organizing and
summarizing data.
• For example, tables or graphs are used to organize data,
and descriptive values such as the average score are
used to summarize data.
• A descriptive value for a population is called a parameter
and a descriptive value for a sample is called a statistic.
22 22
Inferential Statistics
• Inferential statistics are methods for using sample data to make
general conclusions (inferences) about populations.
• Because a sample is typically only a part of the whole population,
sample data provide only limited information about the population.
As a result, sample statistics are generally imperfect
representatives of the corresponding population parameters.
23 23
Sampling Error
• The discrepancy between a sample statistic and its
population parameter is called sampling error.
• Defining and measuring sampling error is a large part of
inferential statistics.
24 24
25
Notation
• The individual measurements or scores obtained for a research
participant will be identified by the letter X (or X and Y if there are
multiple scores for each individual).
• The number of scores in a data set will be identified by N for a
population or n for a sample.
• Summing a set of values is a common operation in statistics and
has its own notation. The Greek letter sigma, Σ, will be used to
stand for "the sum of." For example, ΣX identifies the sum of the
scores.
26 26
Order of Operations
1. All calculations within parentheses are done first.
2. Squaring or raising to other exponents is done second.
3. Multiplying, and dividing are done third, and should be
completed in order from left to right.
4. Summation with the Σ notation is done next.
5. Any additional adding and subtracting is done last and should be
completed in order from left to right.
27 27
Uses and Importance of
Statistics and Statistical
Analysis
28
Importance of Statistics
• Statistics is important for researchers and also
consumers of research to understand statistics so that
they can be informed, evaluate the credibility and
usefulness of information, and make appropriate
decisions.
29
Uses of Statistics
• Weather forecast
• Emergency preparedness
• Predicting disease
• Medical studies
• Genetics
• Political campaigns
• Insurance
• Consumer goods
• Quality Testing
• Stock Market
30
Why now? What’s new about statistics
• Statistics is an emergent discipline that has rapidly
adapted to current challenges
• In today’s era of big data– where the computer and
network are everywhere and everything can be
measured—you need statistics to make that data useful
31
Do we really trust statistics? Different
statistics say different things
The international year of Statistics 2013 is the occasion to
remind us of the value of:
• Statistical methods
• Learning how to use them responsibly
• Statistical software as the tools of analysis
• Using statistical professionals to help us out when
needed.
32
Statistics plays a vital role in every fields of
human activity
Statisticians know how to:
• Design studies
• Collect trustworthy data
• Analyze the data appropriately
• Check assumptions
• Draw reliable conclusions
33
Few potential statistical mishaps that
commonly lead to misuse
• P-value
• Faulty questions
• Biased sample
• Data fishing
• Overgeneralization
• False causality
• Incorrect analysis choices
• Violation of the assumptions for an analysis
• Data dredging
• Data manipulation
34
Central Tendency and Variability of Data
35
Questions
• Define
– Mean
– Median
– Mode
• What is the effect of distribution shape on measures of
central tendency?
• When might we prefer one measure of central tendency to
another?
36
Questions (2)
• Define
– Range
– Average Deviation
– Variance
– Standard Deviation
• When might we prefer one measure of variability to another?
• What is a z score?
• What is the point of Tchebycheff’s inequality?
37
Variables have distributions
• A variable is something that changes or has different
values (e.g., anger).
• A distribution is a collection of measures, usually across
people.
• Distributions of numbers can be summarized with
numbers (called statistics or parameters).
38
Central Tendency refers to the Middle
of the Distribution
Variability is about the Spread
Central Tendency: Mode, Median, &
Mean
The mode – the most frequently occurring score. Midpoint
of most populous class interval. Can have bimodal and
multimodal distributions.
41
Median
• Score that separates top 50% from bottom 50%
• Even number of scores, median is half way between two
middle scores.
– 1 2 3 4 | 5 6 7 8 – Median is 4.5
• Odd number of scores, median is the middle number
– 1 2 3 4 5 6 7 – Median is 4
42
Mean
• Sum of scores divided by the number of people.
Population mean is (mu) and sample mean is (X-bar).
• We calculate the sample mean by: X
X=
X
N
• We calculate the population mean by:
= X
N
43
Deviation from the mean
• x = X – X . Deviations sum to zero.
• Deviation score – deviation from the mean
• Raw scores
9
8 9 10
• Deviation scores 7 8 9 10 11
0
-1 0 1
-2 -1 0 1 2 44
Comparison of mean, median and mode
• Mode
– Good for nominal variables
– Good if you need to know most frequent observation
– Quick and easy
• Median
– Good for “bad” distributions
– Good for distributions with arbitrary ceiling or floor
45
Comparison of mean, median & mode
• Mean
– Used for inference as well as description; best estimator of the
parameter
– Based on all data in the distribution
– Generally preferred except for “bad” distribution. Most
commonly used statistic for central tendency.
46
Best Guess interpretations
• Mean – average of signed error will be zero.
• Mode – will be absolutely right with greatest frequency
• Median – smallest absolute error
47
Expectation
48
Influence of Distribution Shape
Review
What is central tendency?
• Mode
• Median
• Mean
50
2. Variability aka Dispersion
• 4 Statistics: Range, Average Deviation, Variance, &
Standard Deviation
• Range = high score minus low score.
– 12 14 14 16 16 18 20 – range=20-12=8
• Average Deviation – mean of absolute deviations from the
median:
AD =
| X − Md |
N
Note difference between this definition &
undergrad text- deviation from Median vs. Mean 51
Variance
• Population Variance: =
2 ( X − ) 2
N
• Where means population variance,
2
• means population mean, and the other terms have their usual
meaning.
• The variance is equal to the average squared deviation from the
mean.
• To compute, take each score and subtract the mean. Square the
result. Find the average over scores. The variance.
52
Computing the Variance
(N=5) X
X − X (X − X )
2
X
5 15 -10 100
10 15 -5 25
15 15 0 0
20 15 5 25
25 15 10 100
Total: 75 0 250
Mean: Variance Is → 50
Standard Deviation
• Variance is average squared deviation from the mean.
• To return to original, unsquared units, we just take the
square root of the variance. This is the standard
deviation.
• Population formula:
= ( X − ) 2
54
Standard Deviation
• Sometimes called the root-mean-square deviation from
the mean. This name says how to compute it from the
inside out.
• Find the deviation (difference between the score and the
mean).
• Find the deviations squared.
• Find their mean.
• Take the square root.
55
Computing the Standard Deviation
(N=5) X X X −X ( X − X ) 2
5 15 -10 100
10 15 -5 25
15 15 0 0
20 15 5 25
25 15 10 100
Total: 75 0 250
Mean: Variance Is → 50
Sqrt SD Is → = 50 = 7.07
Example: Age Distribution
Distribution of Age
Central Tendency, Variability, and Shape
16
Median = 23
Mean=25.73
0
10 20 30 40 50
age
Review
• Range
• Average deviation
• Variance
• Standard Deviation
58
Standard or z score
• A z score indicates distance from the mean in standard
deviation units. Formula:
X −X X −
z= z=
S
• Converting to standard or z scores does not change the
shape of the distribution. Z-scores are not normalized.
59
Tchebycheff’s Inequality (1)
2
• General form p (| X − | b)
b2
Suppose we know mean height in inches is 66 and SD
is 4 inches. We assume nothing about the shape of the
distribution of height. What is the probability of
finding people taller than 74 inches? (Note that b is a
deviation from the mean; in this case 74-66=8.). Also
74 inches is 2 SDs above the mean; therefore, z = 2.
4 2 16
p 2 = = .25
8 64
[If we assume height is normally distributed, p is much
smaller. But we will get to that later.]
Tchebycheff (2)
• Z-score form | X −| 1
p( k) 2
• Probability of z score from k
any distribution being more For the problem in the
than k SDs from mean is at previous slide:
most 1/k2.
• Z-scores from the worst
distributions are rarely more p (| z | 2) 2 = 2 = .25
1 1
k
than 5 or less than -5. 2
• For symmetric, unimodal
distributions, |z| is rarely more
than 3. 4 1
p (| z | k ) 2
9k
4 1
p (| z | 3) 2 .05
93
Basic Concepts of Probability
Probability Experiments
A probability experiment is an action through which specific results (counts, measurements
or responses) are obtained.
Example:
Rolling a die and observing the number that is rolled is a probability
experiment.
The set of all possible outcomes for an experiment is the sample space.
Example:
The sample space when rolling a die has six outcomes.
{1, 2, 3, 4, 5, 6}
Events
An event consists of one or more outcomes and is a subset of the sample space.
Example:
A die is rolled. Event A is rolling an even number.
This is not a simple event because the outcomes of event A are {2, 4, 6}.
Classical Probability
Example:
A die is rolled.
Find the probability of Event A: rolling a 5.
1
P(A) = 0.167
“Probability of 6
Event A.”
65
Empirical Probability
Empirical (or statistical) probability is based on observations obtained from
probability experiments. The empirical frequency of an event E is the relative
frequency of event E.
P (E ) = Frequency of Event E
Total frequency
=
f
n
Example:
A travel agent determines that in every 50 reservations she makes, 12 will be for
a cruise.
What is the probability that the next reservation she makes will be for a cruise?
12
P(cruise) = = 0.24
50 66
Law of Large Numbers
Example:
3
Sally flips a coin 20 times and gets 3 heads. The empirical probability is .
1 20
This is not representative of the theoretical probability which is . As the
2
number of times Sally tosses the coin increases, the law of large numbers
indicates that the empirical probability will get closer and closer to the
theoretical probability.
67
Probabilities with Frequency Distributions
Example:
The following frequency distribution represents the ages of 30 students in a
statistics class. What is the probability that a student is between 26 and 33
years old?
Ages Frequency, f
8
18 – 25 13 P (age 26 to 33) =
30
26 – 33 8
0.267
34 – 41 4
42 – 49 3
50 – 57 2
f = 30 68
Subjective Probability
Example:
A business analyst predicts that the probability of a certain union going on
strike is 0.15.
Example:
There are 5 red chips, 4 blue chips, and 6 white chips in a basket. Find the
probability of randomly selecting a chip that is not blue.
4
P (selecting a blue chip) = 0.267
15
4 11
P (not selecting a blue chip) =1− = 0.733
15 15
70
UNIVERSITY OF SOUTHERN MINDANAO
KIDAPAWAN CITY CAMPUS
Sudapin, Kidapawan City
Conditional
Probability and the
Multiplication Rule
Conditional Probability
A conditional probability is the probability of an event occurring, given that another
event has already occurred.
Example:
There are 5 red chip, 4 blue chips, and 6 white chips in a basket. Two chips are
randomly selected. Find the probability that the second chip is red given that
the first chip is blue. (Assume that the first chip is not replaced.)
Because the first chip is selected and not replaced, there are only 14
chips remaining.
5
P (selecting a red chip | first chip is blue) = 0.357
14
72
Conditional Probability
Example:
100 college students were surveyed and asked how many hours a week they spent
studying. The results are in the table below. Find the probability that a student spends
more than 10 hours studying given that the student is a male.
Less More
5 to 10 Total
then 5 than 10
Male 11 22 16 49
Female 13 24 14 51
Total 24 46 30 100
The sample space consists of the 49 male students. Of these 49, 16 spend
more than 10 hours a week studying.
16
P (more than 10 hours|male) = 0.327
49
Independent Events
Two events are independent if the occurrence of one of the events does not
affect the probability of the other event. Two events A and B are independent if
P (B |A) = P (B) or if P (A |B) = P (A).
Events that are not independent are dependent.
Example:
Decide if the events are independent or dependent.
Example:
Two cards are selected, without replacement, from a deck. Find the probability
of selecting a diamond, and then selecting a spade.
1
P (rolling a 5) = .
6
1
,
Whether or not the roll is a 5, P (Tail ) =
so the events are independent.
2
1 1 1
=
6 2 2
1
= 0.042
24
UNIVERSITY OF SOUTHERN MINDANAO
KIDAPAWAN CITY CAMPUS
Sudapin, Kidapawan City
A and B
A
A B
B
A B
1
4
2
These events cannot happen at the same time, so the events are
mutually exclusive.
Mutually Exclusive Events
Example:
Decide if the two events are mutually exclusive.
A 9 2 B
J 3 10
J J A 7
K 4
J 5 8
6
Q
Because the card can be a Jack and a heart at the same time, the
events are not mutually exclusive.
The Addition Rule
The probability that event A or B will occur is given by
P (A or B) = P (A) + P (B) – P (A and B ).
If events A and B are mutually exclusive, then the rule can be simplified to P
(A or B) = P (A) + P (B).
Example:
You roll a die. Find the probability that you roll a number less than 3 or a 4.
Less More
5 to 10 Total
then 5 than 10
Male 11 22 16 49
Female 13 24 14 51
Total 24 46 30 100
The events are mutually exclusive.
Counting Principles
Fundamental Counting Principle
If one event can occur in m ways and a second event can occur in n ways, t
he number of ways the two events can occur in sequence is m· n. This rule c
an be extended for any number of events occurring in a sequence.
Example:
A meal consists of a main dish, a side dish, and a dessert. How many different
meals can be selected if there are 4 main dishes, 2 side dishes and 5 desserts
available?
4 2 5 = 40
There are 40 meals available.
Fundamental Counting Principle
Example:
Two coins are flipped. How many different outcomes are there? List the sample
space.
Start
1st Coin
Tossed
Heads Tails 2 ways to flip the coin
2nd Coin
Tossed
Heads Tails Heads Tails 2 ways to flip the coin
“n factorial”
Example:
How many different surveys are required to cover all possible question
arrangements if there are 7 questions in a survey?
7! = 7 · 6 · 5 · 4 · 3 · 2 · 1 = 5040 surveys
Permutation of n Objects Taken r at a Time
The number of permutations of n elements taken r at a time is
n! .
n Pr =
(n − r)!
# in the
group # taken from
the group
Example:
You are required to read 5 books from a list of 8. In how many different orders ca
n you do so?
Pr = 8 P5 = 8! = 8! = 8 7 6 5 4 3 2 1 = 6720 ways
n
(8 − 5)! 3! 3 2 1
Distinguishable Permutations
The number of distinguishable permutations of n objects, where n1 are one typ
e, n2 are another type, and so on is
n! , where n1 + n2 + n3 + + nk = n.
n1 ! n2 ! n3 ! nk !
Example:
Jessie wants to plant 10 plants along the sidewalk in her front yard. She has 3 r
ose bushes, 4 daffodils, and 3 lilies. In how many distinguishable ways can the p
lants be arranged?
10! 10 9 8 7 6 5 4!
=
3!4!3! 3!4!3!
= 4,200 different ways to arrange the plants
Combination of n Objects Taken r at a Time
A combination is a selection of r objects from a group of n things when order doe
s not matter. The number of combinations of r objects selected from a grou
p of n objects is
nC r =
n! .
# in the (n − r)! r !
collection
# taken from the
collection
Example:
You are required to read 5 books from a list of 8. In how many different ways c
an you do so if the order doesn’t matter?
C 5 = 8! = 8 7 6 5!
8
3!5! 3!5!
= 56 combinations
91
Application of Counting Principles
Example:
In a state lottery, you must correctly select 6 numbers (in any order)
out of 44 to win the grand prize.
a.) How many ways can 6 numbers be chosen from the 44
numbers?
b.) If you purchase one lottery ticket, what is the
probability of winning the top prize?
44!
a.) C = = 7,059,052 combinations
44 6 6!38!
b.) There is only one winning ticket, therefore,
1
P (win) = 0.00000014
7059052 92