EDA Midterms
EDA Midterms
EDA Midterms
MATH 403
Engineering
Data
Analysis
CABACES, DONNALYN C.
MARCAIDA, MARJORIE G.
SOTTO, RODOLFO JR. C.
MATH 403- ENGINEERING DATA ANALYSIS
Chapter 1
OBTAINING DATA
Introduction
Statistics may be defined as the science that deals with the collection,
judgments or conclusions that help in the decision-making process. The two parts of this
definition correspond to the two main divisions of Statistics. These are Descriptive
Statistics and Inferential Statistics. Descriptive Statistics, which is referred to in the first
part of the definition, deals with the procedures that organize, summarize and describe
quantitative data. It seeks merely to describe data. Inferential Statistics, implied in the
second part of the definition, deals with making a judgment or a conclusion about a
population based on the findings from a sample that is taken from the population.
At the end of this module, it is expected that the students will be able to:
Statistical Terms
Before proceeding to the discussion of the different methods of obtaining data, let
Population or Universe refers to the totality of objects, persons, places, things used in
Data are facts, figures and information collected on some characteristics of a population
Ungrouped (or raw) data are data which are not organized in any specific way. They are
Grouped Data are raw data organized into groups or categories with corresponding
have a number of different values. It differentiates a particular member from the rest of
in research. They differ in many respects, most notably in the role they are given in the
Collection of the data is the first step in conducting statistical inquiry. It simply refers
to the data gathering, a systematic method of collecting and measuring data from different
experimentations, documents and records, tests or examinations and other forms of data
gathering instruments. The person who conducts the inquiry is an investigator, the one
the process of investigation are known as primary data.” These are collected for the
investigator’s use from the primary source. Secondary data, on the other hand, is
collected by some other organization for their own use but the investigator also gets it for
his use. According to M.M. Blair, “Secondary data are those already in existence for
In the field of engineering, the three basic methods of collecting data are through
retrospective study would use the population or sample of the historical data which had
been archived over some period of time. It may involve a significant amount of data but
those data may contain relatively little useful information about the problem, some of the
relevant data may be missing, recording errors or transcription may be present, or those
other important data may not have been gathered and archived. These result in statistical
analysis of historical data which identifies interesting phenomena but difficulty of obtaining
disturbed as little as possible, and the quantities of interests are recorded. In a designed
or process is done. The resulting system output data must be observed, and an inference
or decision about which variables are responsible for the observed changes in output
the resulting data is the only way to solve them. There are times there is a good underlying
scientific theory to explain the phenomena of interest. Tests or experiments are almost
always necessary to be conducted to confirm the applicability and validity of the theory in
in which statistical thinking and statistical methods play an important role in planning,
an efficient way of collecting information and easy to administer wherein a wide variety of
information can be collected. The researcher can be focused and can stick to the
questions that interest him and are necessary in his statistical inquiry or study.
MATH 403- ENGINEERING DATA ANALYSIS
his ability to respond. Sometimes answers may lead to vague data. Surveys can be done
incomplete responses, higher response rates, and greater control over the environment
in which the survey is administered; also, the researcher can collect additional information
interviews are that they can be expensive and time-consuming and may require a large
staff of trained interviewers. In addition, the response can be biased by the appearance
administered in large numbers and does not require many interviewers and there is less
more likely to stop participating mid-way through the survey and respondents cannot ask
to clarify their answers. There are lower response rates than in personal interviews.
1. Determine the objectives of your survey: What questions do you want to answer?
2. Identify the target population sample: Whom will you interview? Who will be the
4. Decide what questions you will ask in what order, and how to phrase them.
Sample must be a representative of the target population. The target population is the
entire group a researcher is interested in; the group about which the researcher wishes
to draw conclusions.
There are two ways of selecting a sample. These are the non-probability sampling
Non-Probability Sampling
method is convenient and economical but the inferences made based on the findings are
not so reliable. The most common types of non-probability sampling are the convenience
from the respondents which favors the researcher but can cause bias to the respondents.
the characteristic of interest made by the researcher. Randomization is absent in this type
of sampling.
There are two types of quota sampling: proportional and non proportional. In
For instance, if you know the population has 40% women and 60% men, and that
you want a total sample size of 100, you will continue sampling until you get those
number of sampled units in each category is specified and not concerned with having
Probability Sampling
to be selected as a part of the sample. There are several probability techniques. Among
these are simple random sampling, stratified sampling and cluster sampling.
subjects (a sample) is selected for study from a larger group (a population). Each
individual is chosen entirely by chance and each member of the population has an equal
chance of being included in the sample. Every possible sample of a given size has the
same chance of selection; i.e. each member of the population is equally likely to be
Stratified Sampling
There may often be factors which divide up the population into sub-populations
(groups / strata) and the measurement of interest may vary among the different sub-
populations. This has to be accounted for when a sample from the population is selected
MATH 403- ENGINEERING DATA ANALYSIS
stratified sampling.
of a population. When a sample is to be taken from a population with several strata, the
proportion of each stratum in the sample should be the same as in the population.
can be isolated (strata). Simple random sampling is most appropriate when the entire
population from which the sample is taken is homogeneous. Some reasons for using
Cluster Sampling
into groups, or clusters, and a random sample of these clusters are selected. All
The products and processes in the engineering and scientific disciplines are mostly
existing process through maximizing the yield and decreasing the variability or in
developing new products and processes. It is a technique needed to identify the "vital
few" factors in the most efficient manner and then directs the process to its best setting
to meet the ever-increasing demand for improved quality and increased productivity.
The methodology of DOE ensures that all factors and their interactions are
systematically investigated resulting to reliable and complete information. There are five
stages to be carried out for the design of experiments. These are planning, screening,
1. Planning
upon the process of testing and data collection. At this stage, identification of the
available resources to achieve the objectives. Individuals from different disciplines related
to the product or process should compose a team who will conduct the investigation. They
are to identify possible factors to investigate and the most appropriate responses to
measure. A team approach promotes synergy that gives a richer set of factors to study
and thus a more complete experiment. Experiments which are carefully planned always
lead to increased understanding of the product or process. Well planned experiments are
2. Screening
Screening experiments are used to identify the important factors that affect the
process under investigation out of the large pool of potential factors. Screening process
eliminates unimportant factors and attention is focused on the key factors. Screening
experiments are usually efficient designs which require few executions and focus on the
3. Optimization
After narrowing down the important factors affecting the process, then determine
the best setting of these factors to achieve the objectives of the investigation. The
objectives may be to either increase yield or decrease variability or to find settings that
achieve both at the same time depending on the product or process under investigation.
4. Robustness Testing
Once the optimal settings of the factors have been determined, it is important to make the
product or process insensitive to variations resulting from changes in factors that affect
the process but are beyond the control of the analyst. Such factors are referred to as
ensure that the product or process is made robust or insensitive to these factors.
5. Verification
This final stage involves validation of the optimum settings by conducting a few follow-
up experimental runs. This is to confirm that the process functions as expected and all
REFERENCES:
Montgomery, Douglas C.,et al., Applied Statistics and Probabiliy for Engineers, 7th ed., John
Wiley & Sons (Asia) Pte Ltd, 2018
Panopio, Felix M. (2004). Statistics with Probability. Batangas City, Philippines: Feliber
Publishing House
Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists, 9th ed.,
Pearson Education Inc., 2016
https://mathspace.co/learn/world-of-maths/language-and-use-of-statistics/planning-a-statistical-
investigation-i-investigation-18643/investigation-statistical-inquiry-916/
MATH 403- ENGINEERING DATA ANALYSIS
CHAPTER TEST
2. As one of the students of EDA class, you are tasked to conduct a survey to show
Architecture and Fime Arts would like to engage in during the first semester. Follow
factors that affect the distrance travelled by the ball at it is thrown by the catapult.
Also, you are to establish the settings to reach 25, 50, 75 and 100 inches. The
response variable is the distance and the factors are the band height, start angle,
number of rubber bands used ( 1 or 2), arm length, and the stop angle. Explain
how are you going to conduct the experiment taking note of the stages of planning
Chapter 2
PROBABILITY
Introduction
Probability is simply how likely an event is to happen. “The chance of rain today is
50%” is a statement that enumerates our thoughts on the possibility of rain. The likelihood
percentage from 0 to 100%. The higher the number means the event is more likely to
happen than the lower number. A zero (0) probability indicates that the outcome is
impossible to happen while a probability of one (1) indicates that the outcome will occur
inevitably.
This module intends to discuss the concept of probability for discrete sample
spaces, its application, and ways of solving the probabilities of different statistical data.
At the end of this module, it is expected that the students will be able to:
1. Understand and describe sample spaces and events for random experiments
Probability
For example, the probability of flipping a coin and it being heads is ½, because
there is 1 way of getting a head and the total number of possible outcomes is 2 (a head
The probability of something not happening is 1 minus the probability that it will
happen.
Sample space is the set of all possible outcomes or results of a random experiment.
Sample space is represented by letter S. Each outcome in the sample space is called an
element of that set. An event is the subset of this sample space and it is represented by
MATH 403- ENGINEERING DATA ANALYSIS
letter E. This can be illustrated in a Venn Diagram. In Figure 2.1, the sample space is
represented by the rectangle and the events by the circles inside the rectangle.
The events A and B (in a to c) and A, B and C (in d and e) are all subsets of the
sample space S.
Figure 2.1 Venn diagrams of sample space with events (adapted from Montgomery et
al., 2003)
For example if a dice is rolled we have {1, 2, 3, 4, 5, and 6} as sample space. The event
can be {1, 3, and 5} which means set of odd numbers. Similarly, when a coin is tossed
experiment and event is the subset of sample space. Let us try to understand this with
few examples. What happens when we toss a coin thrice? If a coin is tossed three times
All these are the outcomes of the experiment of tossing a coin three times. Hence,
Now, suppose the event be the set of outcomes in which there are only two heads.
The outcomes in which we have only two heads are HHT, HTH and THH hence the event
is given by,
There can be more than one event. In this case, we can have an event as getting only
one tail or event of getting only one head. If we have more than one event we can
represent these events by E1, E2, E3 etc. We can have more than one event for a Sample
space but there will be one and only one Sample space for an Event. If we have Events
E1, E2, E3, …… En as all the possible subset of sample space then we have,
S = E1 ∪ E2 ∪ E3 ∪ …….∪ En
MATH 403- ENGINEERING DATA ANALYSIS
We can understand this with the help of a simple example. Consider an experiment of
S = {1, 2, 3, 4, 5, 6}
even number as outcome for this experiment then we can represent E 1 and E2 as the
following set,
E1 = {1, 3, 5}
E2 = {2, 4, 6}
So we have
Or S = E1 ∪ E2
Null space – is a subset of the sample space that contains no elements and is denoted
Intersection of events
event containing all elements that are common to A and B. This is illustrated as the
For example,
Let X = {q, w, e, r, t,} and Y = {a, s, d, f}; then X Y = , since X and Y have no
elements in common.
We can say that an event is mutually exclusive if they have no elements in common.
This is illustrated in Figure 2.1 (b) where we can see that A B =.
Union of Events
The union of events A and B is the event containing all the elements that belong
For example,
Compliment of an Event
The complement of an event A with respect to S is the set of all elements of S that
are not in A and is denoted by A’. The shaded region in Figure 2.1 (e) shows (A C)’.
For example,
Probability of an Event
Sample space and events play important roles in probability. Once we have
sample space and event, we can easily find the probability of that event. We have
𝑛(𝐸)
𝑃(𝐸) =
𝑛(𝑆)
Where,
When probabilities are assigned to the outcomes in a sample space, each probability
must lie between 0 and 1 inclusive, and the sum of all probabilities assigned must be
equal to 1. Therefore,
Let us try to understand this with the help of an example. If a die is tossed, the
sample space is {1, 2, 3, 4, 5, 6}. In this set, we have a number of elements equal to 6.
Now, if the event is the set of odd numbers in a dice, then we have {1, 3, and 5} as an
event. In this set, we have 3 elements. So, the probability of getting odd numbers in a
3 1
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = =
6 2
MATH 403- ENGINEERING DATA ANALYSIS
Multiplicative Rule
Suppose you have j sets of elements, n1 in the first set, n2 in the second set, ...
and nj in the jth set. Suppose you wish to form a sample of j elements by taking one
element from each of the j sets. The number of possible sets is then defined by:
𝑛1 ∙ 𝑛2 ∙ … ∙ 𝑛𝑗
Permutation Rule
single set of n distinctively different elements, you wish to select k elements from the n
and arrange them within k positions. The number of different permutations of the n
𝑛!
𝑃𝑘𝑛 =
(𝑛 − 𝑘)!
Partitions rule
partition them into k sets, with the first set containing n1 elements, the second containing
n2 elements, ..., and the kth set containing nk elements. The number of different partitions
is
𝑛!
𝑛1 ! 𝑛2 ! … 𝑛𝑘 !
Where,
n1 + n2 + … + nk = n
MATH 403- ENGINEERING DATA ANALYSIS
The numerator gives the permutations of the n elements. The terms in the
denominator remove the duplicates due to the same assignments in the k sets
(multinomial coefficients).
Combinations Rule
𝑛 𝑛!
( )=
𝑘 𝑘! (𝑛 − 𝑘)!
Two events are mutually exclusive or disjoint if they cannot occur at the same
time.
The probability that Event A occurs, given that Event B has occurred, is called a
The complement of an event is the event not occurring. The probability that Event
The probability that Events A and B both occur is the probability of the intersection
The probability that Events A or B occur is the probability of the union of A and B.
If the occurrence of Event A changes the probability of Event B, then Events A and
B are dependent. On the other hand, if the occurrence of Event A does not change
Rule of Addition
Example 1. A student goes to the library. The probability that she checks out (a) a work
of fiction is 0.40, (b) a work of non-fiction is 0.30, and (c) both fiction and non-fiction is
0.20. What is the probability that the student checks out a work of fiction, non-fiction, or
both?
Solution:
Rule of Multiplication
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵)
Dependent - Two outcomes are said to be dependent if knowing that one of the
outcomes has occurred affects the probability that the other occurs
that event B occurs after event A has already occurred. The probability is denoted
by 𝑃(𝐵|𝐴).
Rule 2: When two events are dependent, the probability of both occurring is:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴)
𝑃(𝐴 ∩ 𝐵)
Where 𝑃(𝐵|𝐴) = , provided that P (A) 0
𝑃(𝐴)
Example 1. A day’s production of 850 manufactured parts contains 50 parts that do not
meet customer requirements. Two parts are selected randomly without replacement from
the batch. What is the probability that the second part is defective given that the first part
is defective?
Solution:
P (B|A) =?
If the first part is defective, prior to selecting the second part, the batch contains
P (B|A) = 49/849
MATH 403- ENGINEERING DATA ANALYSIS
Example 2. An urn contains 6 red marbles and 4 black marbles. Two marbles are drawn
without replacement from the urn. What is the probability that both of the marbles are
black?
Solution:
In the beginning, there are 10 marbles in the urn, 4 of which are black. Therefore,
P (A) = 4/10.
After the first selection, there are 9 marbles in the urn, 3 of which are black.
4 3
𝑃(𝐴 ∩ 𝐵) = ( ) ( ) = 𝟎. 𝟏𝟑𝟑
10 9
Example 3. Two cards are selected from a pack of cards. What is the probability that they
Solution:
We require P (A B). Notice that these events are dependent because the
probability that the second card is a queen depends on whether or not the first card is a
queen.
MATH 403- ENGINEERING DATA ANALYSIS
P (A B) = P (A) P (B|A)
Rule of Subtraction
The probability that event A will occur is equal to 1 minus the probability that event
𝑃(𝐴) = 1 − 𝑃(𝐴′ )
Example 1.The probability of Bill not graduating in college is 0.8. What is the probability
Solution:
𝑃(𝐴) = 1 − 0.8 = 𝟎. 𝟐
REFERENCES:
Montgomery, D. C. et al. (2003). Applied Statistics and Probability for Engineers 3rd Edition. USA.
John Wiley & Sons, Inc.
Walpole, R. E. et al. (2016). Probability & Statistics for Engineers & Scientists 9th Edition. England.
Pearson Education Limited
https://math.tutorvista.com/statistics/sample-space-and-events.html
https://stattrek.com/probability/probability-rules.aspx
https://www.ck12.org/book/CK-12-Probability-and-Statistics-Advanced-Second-
Edition/section/3.6/
MATH 403- ENGINEERING DATA ANALYSIS
CHAPTER TEST
1. Three events are shown on the Venn diagram in the following figure:
Reproduce the figure and shade the region that corresponds to each of the
following events.
a. A’ b. A B c. (A B) C d. (B C)’ e. (A B)’ C
2. Each of the possible five outcomes of a random experiment is equally likely. The
sample space is {a, b, c, d, e}. Let A denote the event {a, b}, and let B denote the
3. If A, B, and C are mutually exclusive events with P (A) = 0.2, P(B) = 0.3, and P(C)
b. P(A B C) d. P[(A B) C]
MATH 403- ENGINEERING DATA ANALYSIS
4. A lot of 100 semiconductor chips contains 20 that are defective. Two are selected
b. What is the probability that the second one selected is defective given that
d. How does the answer to part (b) change if chips selected were replaced
5. Suppose 2% of cotton fabric rolls and 3% of nylon fabric rolls contain flaws. Of the
rolls used by a manufacturer, 70% are cotton and 30% are nylon. What is the
probability that a randomly selected roll used by the manufacturer contains flaws?
MATH 403- ENGINEERING DATA ANALYSIS
Chapter 3
DISCRETE PROBABILITY DISTRIBUTIONS
Introduction
Many physical systems can be modelled by a similar or the same random variables
and random experiments. The distribution of the random variables involved in each of
these common systems can be analyzed, and the result of that analysis can be used in
different applications and examples. In this chapter, the analysis of several random
experiments and discrete random variables that often appear in applications is discussed.
A discussion of the basic sample space of the random experiment is frequently omitted
At the end of this module, it is expected that the students will be able to:
presented.
MATH 403- ENGINEERING DATA ANALYSIS
specific applications.
6. Calculate probabilities, determine means and variances for each of the discrete
discrete random variable. A discrete random variable is a random variable that has
With a discrete probability distribution, each possible value of the discrete random
Random Variables
other mathematical variables, a random variable conceptually does not have a single,
fixed value (even if unknown); rather, it can take on a set of possible different values,
imprecise measurements). They may also conceptually represent either the results of an
“objectively” random process (such as rolling a die), or the “subjective” randomness that
either discrete (that is, taking any of a specified list of exact values) or as continuous
function describing the possible values of a random variable and their associated
Discrete random variables can take on either a finite or at most a countably infinite
set of discrete values (for example, the integers). Their probability distribution is given by
a probability mass function which directly maps each value of the random variable to a
probability. For example, the value of x1 takes on the probability p1, the value of x2 takes
on the probability p2, and so on. The probabilities pi must satisfy two requirements: every
probability pi is a number between 0 and 1, and the sum of all the probabilities is 1.
(p1+p2+⋯+pk=1)
MATH 403- ENGINEERING DATA ANALYSIS
This shows the probability mass function of a discrete probability distribution. The
probabilities of the singletons {1}, {3}, and {7} are respectively 0.2, 0.5, 0.3. A set not
Examples of discrete random variables include the values obtained from rolling a
of possible values. The probability distribution of a discrete random variable x lists the
values and their probabilities, where value x1 has probability p1, value x2 has
probability x2, and so on. Every probability pi is a number between 0 and 1, and the sum
The number of eggs that a hen lays in a given day (it can’t be 2.3)
graph. For example, suppose that xx is a random variable that represents the number of
people waiting at the line at a fast-food restaurant and it happens to only take the values
2, 3, or 5 with probabilities 2/10, 3/10, and 5/10 respectively. This can be expressed
MATH 403- ENGINEERING DATA ANALYSIS
through the function f(x) = x/10, x=2, 3, 5 or through the table below. Of the conditional
probabilities of the event BB given that A1 is the case or that A2 is the case, respectively.
Notice that these two representations are equivalent, and that this can be represented
Probability Histogram: This histogram displays the probabilities of each of the three
discrete random variables. The formula, table, and probability histogram satisfy the
1. 0≤f(x) ≤1, i.e., the values of f(x) are probabilities, hence between 0 and 1.
2. ∑f(x) =1, i.e., adding the probabilities of all disjoint cases, we obtain the probability
function (pmf). The probability mass function has the same purpose as the probability
histogram, and displays specific probabilities for each discrete random variable.
MATH 403- ENGINEERING DATA ANALYSIS
This shows the graph of a probability mass function. All the values of this function
x f(x)
2 0.2
3 0.3
5 0.5
Discrete Probability Distribution: This table shows the values of the discrete random
are defective. If a school makes a random purchase of 2 of these computers, find the
Solution:
Let X be a random variable whose values x are the possible numbers of defective
computers purchased by the school. Then x can only take the numbers 0, 1, and 2.
MATH 403- ENGINEERING DATA ANALYSIS
Now,
x 0 1 2
You might recall that the cumulative distribution function is defined for discrete
Again, F(x) accumulates all of the probability less than or equal to x. The cumulative
that of the discrete case. All we need to do is replace the summation with an integral.
defined as:
𝑥
𝐹(𝑥) = ∫ 𝑓(𝑡)𝑑𝑡
−∞
For -∞<x<∞
MATH 403- ENGINEERING DATA ANALYSIS
Example 1. Suppose that a day’s production of 850 manufactured parts contains 50 parts
that do not con- form to customer requirements. Two parts are selected at random,
without replacement, from the batch. Let the random variable X equal the number of
Solution:
The question can be answered by first finding the probability mass function of X.
Therefore,
The cumulative distribution function for this example is graphed in the figure below. Note
that F(x) is defined for all x from - < x < and not only for 0, 1, and 2.
The expected value of a random variable is the weighted average of all possible
probability distribution of a discrete random variable X lists the values and their
probabilities, such that xi has a probability of pi. The probabilities pi must satisfy two
requirements:
expectation, EV, mean, or first moment) of a random variable is the weighted average of
all possible values that this random variable can take on. The weights used in computing
The expected value may be intuitively understood by the law of large numbers: the
expected value, when it exists, is almost surely the limit of the sample mean as sample
size grows to infinity. More informally, it can be interpreted as the long-run average of the
results of many independent repetitions of an experiment (e.g. a dice roll). The value may
not be expected in the ordinary sense—the “expected value” itself may be unlikely or even
impossible (such as having 2.5 children), as is also the case with the sample mean.
MATH 403- ENGINEERING DATA ANALYSIS
Suppose random variable X can take value x1 with probability p1, value x2 with
probability p2, and so on, up to value xi with probability pi. Then the expectation value of
a random variable XX is defined as E[X] = x1 p1+ x2 p2+⋯+xi pi, which can also be written
If all outcomes xi are equally likely (that is, p 1= p2 =⋯=pi), then the weighted average
turns into the simple average. This is intuitive: the expected value of a random variable is
the average of all values it can take; thus, the expected value is what one expects to
happen on average. If the outcomes xi are not equally probable, then the simple average
must be replaced with the weighted average, which takes into account the fact that some
outcomes are more likely than the others. The intuition, however, remains the same: the
For example, let X represent the outcome of a roll of a six-sided die. The possible values
for X are 1, 2, 3, 4, 5, and 6, all equally likely (each having the probability of 1/6). The
expectation of X is:
In this case, since all outcomes are equally likely, we could have simply averaged the
numbers together:
(1 + 2 + 3 + 4 + 5 + 6) /6 = 3.5.
MATH 403- ENGINEERING DATA ANALYSIS
Binomial Experiment
Each trial can result in just two possible outcomes. We call one of these outcomes
The trials are independent; that is, the outcome on one trial does not affect the
Consider the following statistical experiment. You flip a coin 2 times and count the
2. Each trial can result in just two possible outcomes - heads or tails.
4. The trials are independent; that is, getting heads on one trial does not affect
b (x; n, P): Binomial probability - the probability that an n-trial binomial experiment
trial is P.
Binomial Distribution
a binomial distribution.
MATH 403- ENGINEERING DATA ANALYSIS
Suppose we flip a coin two times and count the number of heads (successes). The
binomial random variable is the number of heads, which can take on values of 0, 1, or 2.
0 0.25
1 0.50
2 0.25
results in exactly x successes. For example, in the above table, we see that the binomial
Given x, n, and P, we can compute the binomial probability based on the binomial
formula.
probability is:
MATH 403- ENGINEERING DATA ANALYSIS
b (x; n, P) = nCx * Px * (1 - P) n - x
or
Example 1.Suppose a die is tossed 5 times. What is the probability of getting exactly 2
fours?
Solution:
This is a binomial experiment in which the number of trials is equal to 5, the number
of successes is equal to 2, and the probability of success on a single trial is 1/6 or about
random variable falls within a specified range (e.g., is greater than or equal to a stated
45 or fewer heads in 100 tosses of a coin. This would be the sum of all these individual
binomial probabilities.
b (x = 0; 100, 0.5) + b (x = 1; 100, 0.5) + ... + b (x = 44; 100, 0.5) + b (x = 45; 100, 0.5)
MATH 403- ENGINEERING DATA ANALYSIS
coin?
Solution:
formula. The sum of all these probabilities is the answer we seek. Thus,
b (x < 45; 100, 0.5) = b (x = 0; 100, 0.5) + b (x = 1; 100, 0.5) + . . . + b (x = 45; 100, 0.5)
students from the same school apply, what is the probability that at most 2 are
accepted?
Solution:
formula. The sum of all these probabilities is the answer we seek. Thus,
experiment.
failures.
The average number of successes (μ) that occurs in a specified region is known.
The probability that a success will occur is proportional to the size of the region.
The probability that a success will occur in an extremely small region is virtually
zero.
Note that the specified region could take many forms. For instance, it could be a length,
Notation
The following notation is helpful, when we talk about the Poisson distribution.
logarithm system.)
P (x; μ): The Poisson probability that exactly x successes occur in a Poisson
Poisson Distribution
a Poisson distribution.
Given the mean number of successes (μ) that occur in a specified region, we can
number of successes within a given region is μ. Then, the Poisson probability is:
where x is the actual number of successes that result from the experiment, and e is
Example 1. The average number of homes sold by the Acme Realty company is 2
homes per day. What is the probability that exactly 3 homes will be sold tomorrow?
Solution:
x = 3; since we want to find the likelihood that 3 homes will be sold tomorrow.
MATH 403- ENGINEERING DATA ANALYSIS
P (3; 2) = 0.180
random variable is greater than some specified lower limit and less than some specified
upper limit.
Example. Suppose the average number of lions seen on a 1-day safari is 5. What is the
probability that tourists will see fewer than four lions on the next 1-day safari?
x = 0, 1, 2, or 3; since we want to find the likelihood that tourists will see fewer
than 4 lions; that is, we want the probability that they will see 0, 1, 2, or 3 lions.
To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3
lions. Thus, we need to calculate the sum of four probabilities: P (0; 5) + P (1; 5) + P (2;
5) + P (3; 5).
MATH 403- ENGINEERING DATA ANALYSIS
P (x < 3, 5) = [ (e-5) (50) / 0!] + [ (e-5) (51) / 1!] + [(e-5) (52) / 2!] + [(e-5) (53) / 3!]
[(0.006738) (125) / 6]
P (x < 3, 5) = 0.2650
REFERENCES:
Montgomery, D. C. et al. (2003). Applied Statistics and Probability for Engineers 3rd Edition. USA.
John Wiley & Sons, Inc.
Walpole, R. E. et al. (2016). Probability & Statistics for Engineers & Scientists 9th Edition. England.
Pearson Education Limited
https://courses.lumenlearning.com/boundless-statistics/chapter/discrete-random-variables/
https://newonlinecourses.science.psu.edu/stat414/node/98/
https://stattrek.com/probability-distributions/binomial.aspx
https://stattrek.com/probability-distributions/poisson.aspx
MATH 403- ENGINEERING DATA ANALYSIS
CHAPTER TEST
1. The sample space of a random experiment is {a, b, c, d, e, f}, and each outcome
Outcome a b c d e f
x 0 0 1.5 1.5 2 3
2. Marketing estimates that a new instrument for the analysis of soil samples will be
0.6, and 0.1, respectively. The yearly revenue associated with a very successful,
million, respectively. Let the random variable X denote the yearly revenue of the
probabilities that the first, second, and third components meet specifications are
0.95, 0.98, and 0.99. Assume that the components are independent. Determine
the probability mass function of the number of components in the assembly that
meet specifications.
MATH 403- ENGINEERING DATA ANALYSIS
4. Marketing estimates that a new instrument for the analysis of soil samples will be
0.6, and 0.1, respectively. The yearly revenue associated with a very successful,
million, respectively. Let the random variable X denote the yearly revenue of the
5. Given:
Chapter 4
CONTINUOUS PROBABILITY
DISTRIBUTIONS
Introduction
to model the range of possible values of the random variable by an interval (finite or
infinite) of real numbers. Since the range is any value in the interval, the number of
possible values of the random variable X is uncountably infinite and would have a different
At the end of this module, it is expected that the students will be able to:
5. Use the table for cumulative distribution function of a standard normal distribution
to calculate probabilities
its values. Consequently, its probability distribution cannot be given in tabular form. At
first this may seem startling, but it, becomes more plausible when we consider a particular
example. Let us discuss a random variable whose values are the heights of all people
over 21 years of age. Between any two values, say 163.5 and 164.5 centimeters, or even
163.99 and 164.01 centimeters, there are an infinite number of heights, one of which is
164 centimeters. The probability of selecting a person at random who is exactly 164
centimeters tall and not one of the infinitely large set of heights so close to 164
centimeters that you cannot humanly measure the difference is remote, and thus we
assign a probability of zero to the event. This is not the case, however, if we talk about
the probability of selecting a person who is at least 163 centimeters but not more than
165 centimeters tall. Now we are dealing with an interval rather than a point value of our
random variable.
MATH 403- ENGINEERING DATA ANALYSIS
continuous random variables such as P (a < X < b), P (W > c), and so forth. Note that
when X is continuous,
That is, it does not matter whether we include an endpoint of the interval or not.
This is not true, though, when X is discrete. Although the probability distribution of a
formula. Such a formula would necessarily be a function of the numerical values of the
continuous random variable X and as such will be represented by the functional notation
f(x). In dealing with continuous variables, f(x) is usually called the probability density
function, or
simply the density function of A'. Since X is defined over a continuous sample space, it is
possible for f(x) to have a finite number of discontinuities. However, most density
functions that have practical applications in the analysis of statistical data are continuous
and their graphs may take any of several forms, some of which are shown in Figure 4.1.
Because areas will be used to represent probabilities and probabilities arc positive
numerical values, the density function must lie entirely above the x axis. A probability
MATH 403- ENGINEERING DATA ANALYSIS
density function is constructed so that the area under its curve bounded by the x axis is
equal to 1 when computed over the range of X for which f(x) is defined. Should this range
of X be a finite interval, it is always possible to extend the interval to include the entire sot
of real numbers by defining f(x) to be zero at all points in the extended portions of the
interval. In Figure 4.2, the probability that X assumes a value between a and /; is equal to
the shaded area under the density function between the ordinates at. x = a and x = b, and
𝒃
P (a < X < b) = ∫𝒂 𝐟(𝐱) 𝐝𝐱
𝑥2
, −1 < 𝑥 < 2,
Example 1. For the density function 𝑓(𝑥) = {3 , find f(x), and use it to
(0), 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Solution:
Therefore,
0, 𝑥 < −1
𝑥3+ 1
F(x) = { , − 1 ≤ 𝑥 < 2,
9
1, 𝑥 ≥ 2.
Now,
2 1 1
P (0 < X ≤ 1) = F (1) – F (0) = 9 − 9 = 9
Let X be a continuous random variable with range [a, b] and probability density
𝑏
𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥
𝑎
Let’s see how this compares with the formula for a discrete random variable:
𝑛
𝐸(𝑋) = ∑ 𝑥𝑖 𝑝(𝑥𝑖 )
𝑖=1
The discrete formula says to take a weighted sum of the values xi of X, where the weights
are the probabilities p (xi). Recall that f(x) is a probability density. Its units are prob/ (unit
of X)
x. Thus we can interpret the formula for E(X) as a weighted integral of the values x of X,
Solution:
1 1
𝑥2 𝟏
𝐸(𝑋) = ∫ 𝑥𝑑𝑥 = | =
0 2 0 𝟐
3
Example 2. Let X have range [0, 2] and density 𝑥 2 . Find E(X).
8
2
2 2 3 3𝑥 4 3
𝐸(𝑋) = ∫0 𝑥𝑓(𝑥)𝑑𝑥 = ∫0 𝑥 3 𝑑𝑥 = | =2
8 32 0
Does it make sense that this X has mean is in the right half of its range?
Yes. Since the probability density increases as x increases over the range, the average
µ is “pulled” to the right of the midpoint 1 because there is more mass to the right.
MATH 403- ENGINEERING DATA ANALYSIS
Properties of E(X)
The properties of E(X) for continuous random variables are the same as for
discrete ones:
E (aX + b) = aE(X) + b
Expectation of Functions of X
This works exactly the same as the discrete case. If h(x) is a function then Y = h(X) is a
∞
2𝑥 −λx 2 ∞ 𝟐
𝐸(𝑋 2 ) = ∫ 𝑥 2 λ𝑒 −λx 𝑑𝑥 = [−𝑥 2 𝑒 −λx − 𝑒 − 2 𝑒 −λx ] = 𝟐
0 λ λ 0 𝛌
MATH 403- ENGINEERING DATA ANALYSIS
1
𝑓(𝑥) = , 𝑎≤𝑥≤𝑏
𝑏−𝑎
The probability density function of a continuous uniform random variable is shown below
𝑎+𝑏 2
(𝑏 − 𝑎)2
𝜇 = 𝐸(𝑋) = 𝑎𝑛𝑑 𝜎 = 𝑉(𝑋) =
2 12
The Normal Distribution is the most important and most widely used continuous
analysis of data because the distributions of several important sample statistics tend
Empirical studies have indicated that the Normal distribution provides an adequate
where μ and σ are parameters. These turn out to be the mean and standard deviation,
The curve never actually reaches the horizontal axis buts gets close to it beyond about 3
The graphs below illustrate the effect of changing the values of μ and σ on the shape of
the probability density function. Low variability (σ = 0.71) with respect to the mean gives
a pointed bell-shaped curve with little spread. Variability of σ = 1.41 produces a flatter
Example 1. The volume of water in commercially supplied fresh drinking water containers
is approximately Normally distributed with mean 70 litres and standard deviation 0.75
(i) in excess of 70.9 litres, (ii) at most 68.2 litres, (iii) less than 70.5 litres.
Solution:
Let X denote the volume of water in a container, in litres. Then X ~ N (70, 0.752 ), i.e.
Binomial Approximation
𝑋 − 𝑛𝑝
𝑍=
√𝑛𝑝(1 − 𝑝)
The above equation is the formula for standardizing the random variable X. Probabilities
In some cases, working out a problem using the Normal distribution may be easier than
using a Binomial.
Poisson Approximation
Poisson distribution was developed as the limit of a binomial distribution as the number
of trials increased to infinity therefore the normal distribution can also be used to
𝑋−𝜆
𝑍=
√𝜆
is approximately a standard normal random variable and this approximation is good for
> 5.
Continuity Correction
The binomial and Poisson distributions are discrete random variables, whereas the
normal distribution is continuous. We need to take this into account when we are using
MATH 403- ENGINEERING DATA ANALYSIS
correction.
diagram):
When working out probabilities, we want to include whole rectangles, which is what
Example 1. Suppose we toss a fair coin 20 times. What is the probability of getting
Solution:
X ~ Bin (20, ½)
Since p is close to ½ (it equals ½!), we can use the normal approximation to the binomial.
X ~ N (20 × ½, 20 × ½ × ½) so X ~ N (10, 5) .
In this diagram, the rectangles represent the binomial distribution and the curve is the
normal distribution:
MATH 403- ENGINEERING DATA ANALYSIS
We want P (9 ≤ X ≤ 11), which is the red shaded area. Notice that the first rectangle starts
at 8.5 and the last rectangle ends at 11.5. Using a continuity correction, therefore, our
The exponential distribution obtains its name from the exponential function in the
probability density function. Plots of the exponential distribution for selected values of are
shown in Fig. 4.4. For any value of, the exponential distribution is quite skewed.
Figure 4.4 Probability density function of exponential random variables for selected
values of λ
MATH 403- ENGINEERING DATA ANALYSIS
variances involving exponential random variables. The following example illustrates unit
conversions.
Example 1. In a large corporate computer network, user log-ons to the system can be
modeled as a Poisson process with a mean of 25 log-ons per hour. What is the probability
Solution:
Let X denote the time in hours from the start of the interval until the first log-on.
Then, X has an exponential distribution with log-ons per hour. We are interested in the
probability that X exceeds 6 minutes. Because is given in log-ons per hour, we express
all time units in hours. That is, 6 minutes 0.1 hour. The probability requested is shown as
the shaded area under the probability density function in Fig. 4.4. Therefore,
∞
𝑃(𝑋 > 0.1) = ∫ 25𝑒 −25𝑥 𝑑𝑥 = 𝑒 −25(0.1) = 0.082
0.1
In the previous example, the probability that there are no log-ons in a 6-minute
interval is 0.082 regardless of the starting time of the interval. A Poisson process assumes
that events occur uniformly throughout the interval of observation; that is, there is no
clustering of events. If the log-ons are well modeled by a Poisson process, the probability
that the first log-on after noon occurs after 12:06 P.M. is the same as the probability that
the first log-on after 3:00 P.M. occurs after 3:06 P.M. And if someone logs on at 2:22
P.M., the probability the next log-on occurs after 2:28 P.M. is still 0.082.
Our starting point for observing the system does not matter. However, if there are
high-use periods during the day, such as right after 8:00 A.M., followed by a period of low
use, a Poisson process is not an appropriate model for log-ons and the distribution is not
appropriate for computing probabilities. It might be reasonable to model each of the high
and low-use periods by a separate Poisson process, employing a larger value for during
the high-use periods and a smaller value otherwise. Then, an exponential distribution with
the corresponding value of can be used to calculate log-on probabilities for the high- and
low-use periods.
REFERENCES:
Montgomery, D. C. et al. (2003). Applied Statistics and Probability for Engineers 3rd Edition. USA.
John Wiley & Sons, Inc.
Walpole, R. E. et al. (2016). Probability & Statistics for Engineers & Scientists 9th Edition.
England. Pearson Education Limited
Jeremy Orloff, and Jonathan Bloom. 18.05 Introduction to Probability and Statistics. Spring 2014.
Massachusetts Institute of Technology: MIT Open Courseware, https://ocw.mit.edu. License:
Creative Commons BY-NC-SA.
MATH 403- ENGINEERING DATA ANALYSIS
CHAPTER TEST
e. P (3 X)
2. The probability density function of the length of a metal rod is f(x) = 2 for 2.3 < x
b. If the specifications for this process are from 2.25 to 2.75 meters, what
length 0.5 meters. Over what value the density should be centered to
3. Suppose f(x) = 0.125x for 0 < x < 4. Find the mean and variance of X.
4. Suppose the time it takes a data collection operator to fill out an electronic form
d. What is the mean and variance of the time it takes the operator to fill out the
form?
e. What is the probability that it will take less than two minutes to fill out the
form?
f. Determine the cumulative distribution function of the time it takes to fill out
the form.
MATH 403- ENGINEERING DATA ANALYSIS
h. Approximate that the probability of X is greater than 70 and less than 90.
MATH 403- ENGINEERING DATA ANALYSIS
Chapter 5
JOINT PROBABILITY DISTRIBUTIONS
Introduction
The study of random variables and their probability distributions in the preceding
useful to have more than one random variable defined in a random experiment. In general,
if X and Y are two random variables, the probability distribution that defines their
At the end of this module, it is expected that the students will be able to:
1. Understand and use joint probability mass functions and joint probability density
3. Calculate means and variances for linear functions of random variables and
variable. There will be situations, however, where we may find it desirable to record the
simultaneous outcomes of several random variables. For example, we might measure the
experiment, giving rise to a two dimensional sample space consisting of the outcomes (p,
copper, resulting in the outcomes (h, t). In a study to determine the likelihood of success
in college based on high school data, we might use a three dimensional sample space
and record for each individual his or her aptitude test score, high school rank, and grade-
If X and Y are two discrete random variables, the probability distribution for their
simultaneous occurrence can be represented by a function with values f(x, y) for any pair
of values (x, y) within the range of the random variables X and Y. It is customary to refer
f (x, y) = P (X = x, Y = y)
that is, the values (x, y) give the probability that outcomes x and y occur at the same time.
For example, if an 18 wheeler is to have its tires serviced and X represents the number
of miles these tires have been driven and Y represents the number of tires that need to
be replaced, then f(30000,5) is the probability that the tires are used over 30,000 miles
2. ∑𝑥 ∑𝑦 𝑓(𝑥, 𝑦) = 1
3. 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = 𝑓(𝑥, 𝑦)
Just as the probability mass function of a single random variable X is assumed to be zero
at all values outside the range of X, so the joint probability mass function of X and Y is
Example 1. Two ballpoint pens are selected at random from a box that contains 3 blue
pens, 2 red pens, and 3 green pens. If X is the number of blue pens selected and Y is
Solution:
The possible pairs of values (x, y) are (0, 0), (0, 1), (1, 0), (1, 1), (0, 2), and (2, 0).
Now, f (0, 1), for example, represents the probability that a red and a green pens are
selected. The total number of equally likely ways of selecting any 2 pens from the 8 is
(82) = 28. The number of ways of selecting 1 red from 2 red pens and 1 green from 3
Similar calculations will yield the probabilities for the other cases, which are presented in
Table 1. Note that the probabilities sum to 1. It will become clear that the joint probability
for x = 0, 1, 2; y = 0, 1, 2; and 0 ≤ x + y ≤ 2.
3 3 9 9
= + + =
28 14 28 14
x
f(x,y) Row Totals
0 1 2
3 9 3 15
0
28 28 28 25
3 3 3
y 1 0
14 14 7
1 1
2 0 0
28 28
5 15 3
Column Totals 1
14 28 28
MATH 403- ENGINEERING DATA ANALYSIS
Example2. Suppose we toss a pair of fair, four-sided dice, in which one of the dice is
a) What is the probability that X takes on a particular value x, and Y takes on a particular
value y?
Solution:
Just as we have to in the case with one discrete random variable, in order to find
the “joint probability distribution” of X and Y, we first need to define the support of X and
S1 = {1, 2, 3, 4}
S2 = {1, 2, 3, 4}
Now, that if we let (x, y) denote one of the possible outcomes of one toss of the
pair of dice, then certainly (1, 1) is a possible outcome, as is (1, 2), (1, 3) and (1, 4).
If we continue to enumerate all of the possible outcomes, we soon see that the joint
Now, because the dice are fair, we should expect each of 16 possible outcomes to be
equally likely. Therefore, using the classical approach to assigning probability, the
1
probability that X equals any particular x value, and Y equals any particular y value, is 16.
1
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) =
16
Because we have identified the probability for each (x, y), we have found what we call the
joint probability mass function. Perhaps, it is not too surprising that the joint probability
mass function, which is typically denoted as f(x, y), can be defined as a formula (as we
have above), or as a table. Here’s what our joint probability mass function would like in
tabular form:
Black(Y)
f(x, y) fX (x)
1 2 3 4
1 1 1 1 4
1 16 16 16 16 16
1 1 1 1 4
2 16 16 16 16 16
Red(X)
1 1 1 1 4
3 16 16 16 16 16
1 1 1 1 4
4 16 16 16 16 16
4 4 4 4
fY(y) 16 16 16 16
1
MATH 403- ENGINEERING DATA ANALYSIS
When X and Y are continuous random variables, the joint density function f(x, y) is a
surface lying above the xy plane, and 𝑃[(𝑋, 𝑌) ∈ 𝐴], where A is any region in the xy
plane, is equal to the volume of the right cylinder bounded by the base A and the
surface.
Continuous case. The case where both variables are continuous is obtained easily by
analogy with discrete case on replacing sums by integrals. Thus, the joint probability
function for the random variables X and Y (or, as it is more commonly called, the joint
density function of X and Y). The function f(x, y) is a joint density function of the
∞ ∞
2. ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = 1
Example 1. A privately owned business operates both a drive-in facility and a walk-in
facility. On a randomly selected day, let X and Y, respectively, be the proportions of the
time that the drive-in and the walk-in facilities are in use, and suppose that the joint density
2
𝑓(𝑥, 𝑦) = { 5 (2𝑥 + 3𝑦), 0 ≤ 𝑥 ≤ 1,0 ≤ 𝑦 ≤ 1,
0,
1 1 1
(b) Find 𝑃[(𝑋, 𝑌) ∈ 𝐴] 𝑤ℎ𝑒𝑟𝑒 𝐴 = {(𝑥, 𝑦)|0 < 𝑥 2 , 4 < 𝑦 < 2}
Solution:
∞ ∞ 1 1
2
∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = ∫ ∫ (2𝑥 + 3𝑦)𝑑𝑥𝑑𝑦
−∞ −∞ 0 0 5
1
2𝑥 2 6𝑥𝑦 𝑥 = 1
= ∫ ( + )| 𝑑𝑦
0 5 5 𝑥=0
1
2 6𝑦 2𝑦 3𝑦 2 1 2 3
= ∫ ( + ) 𝑑𝑦 = ( + )| = + = 1
0 5 5 5 5 0 5 5
1 1 1
𝑃[(𝑋, 𝑌) ∈ 𝐴] = 𝑃 (0 < 𝑋 < , <𝑌< )
2 4 2
1⁄2 1⁄2
2
=∫ ∫ (2𝑥 + 3𝑦)𝑑𝑥 𝑑𝑦
1⁄4 0 5
1⁄2 1⁄2
2𝑥 2 6𝑥𝑦 𝑥 = 1⁄2 1 3𝑦
=∫ ( + )| 𝑑𝑦 = ∫ ( + ) 𝑑𝑦
1⁄4 5 5 𝑥=0 1⁄4 10 5
𝑦 3𝑦 2 1⁄2
=( + )|
10 10 1⁄4
1 1 3 1 3 13
= [( + ) − ( + )] =
10 2 4 4 16 160
MATH 403- ENGINEERING DATA ANALYSIS
Practice Problem:
1. From a sack of fruit containing 3 oranges, 2 apples, and 3 bananas, a random sample
of 4 pieces of fruit is selected. If X is the number of oranges and Y is the number of apples
2. Determine the values of c so that the following functions represent joint probability
(c) P(X>Y );
(d) P(X + Y = 4)
MATH 403- ENGINEERING DATA ANALYSIS
of X and Y and the probability distribution of each variable individually. The individual
distribution. The marginal probability distribution of X can be determined from the joint
probability distribution of X and other random variables. For example, consider discrete
random variables X and Y. To determine P(X = x), we sum P(X = x, Y =y) over all points
in the range of (X, Y) for which X = x. Subscripts on the probability mass functions
marginal ability distributions. In the continuous case, an integral replaces the sum.
∞ ∞
𝑔(𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 𝑎𝑛𝑑 ℎ(𝑦) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑥
−∞ −∞
The term marginal is used here because, in the discrete case, the values of g(x) and h(y)
are just the marginal totals of the respective columns and rows when the values of f(x, y)
Example 1. Show that the column and row totals of Table 1. give the marginal distribution
Solution:
3 3 1 5
𝑔(0) = 𝑓(0,0) + 𝑓(0,1) + 𝑓(0,2) = + + = ,
28 14 28 14
9 3 15
𝑔(1) = 𝑓(1,0) + 𝑓(1,1) + 𝑓(1,2) = + +0= ,
28 14 28
3 3
𝑔(2) = 𝑓(2,0) + 𝑓(2,1) + 𝑓(2,2) = +0+0=
28 28
which are just the column totals of Table 1. In a similar manner we could show that the
values of h(y) are given by the row totals. In tabular form, these marginal distributions
x 0 1 2 y 0 1 2
5 15 3 15 3 1
g(x) 14 28 28 h(y)
28 7 28
Example 2. Find g(x) and h(y) for the joint density function of Example 3.
Solution:
By definition, we have
∞ 1
2 4𝑥𝑦 6𝑦 2 𝑦 = 1 4𝑥 + 3
𝑔(𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫ (2𝑥 + 3𝑦)𝑑𝑦 = ( + )| =
−∞ 0 5 5 10 𝑦 = 0 5
Similarly, we have
∞ 1
2 2(1 + 3𝑦)
ℎ(𝑦) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫ (2𝑥 + 3𝑦)𝑑𝑥 =
−∞ 0 5 5
Practice Problem
1. A fast-food restaurant operates both a drive through facility and a walk-in facility. On a
randomly selected day, let X and Y, respectively, be the proportions of the time that the
drive-through and walk-in facilities are in use, and suppose that the joint density function
(c) Find the probability that the drive-through facility is busy less than one-half of the
time.
f(x, y) / g(x) in order to be able to effectively compute conditional probabilities. Let X and
𝑓(𝑥, 𝑦)
𝑓(𝑦|𝑥) = , 𝑝𝑟𝑜𝑣𝑖𝑑𝑒𝑑 𝑔(𝑥) > 0
𝑔(𝑥)
MATH 403- ENGINEERING DATA ANALYSIS
𝑓(𝑥, 𝑦)
𝑓(𝑥|𝑦) = , 𝑝𝑟𝑜𝑣𝑖𝑑𝑒𝑑 ℎ(𝑦) > 0
ℎ(𝑦)
If we wish to find the probability that the discrete random variable X falls between
where the summation extends over all values of X between a and b. When X and Y
𝑏
𝑃(𝑎 < 𝑋 < 𝑏|𝑌 = 𝑦) = ∫ 𝑓(𝑥|𝑦) 𝑑𝑥
𝑎
Solution:
2
3 3 3
ℎ(1) = ∑ 𝑓(𝑥, 1) = + +0=
14 14 7
𝑥=𝑜
Now
𝑓(𝑥, 1) 7
𝑓(𝑥|1) = = ( ) 𝑓(𝑥, 1), 𝑥 = 0,1,2
ℎ(1) 3
MATH 403- ENGINEERING DATA ANALYSIS
Therefore,
7 7 3 1
𝑓(0|1) = ( ) 𝑓(0,1) = ( ) ( ) = ,
3 3 14 2
7 7 3 1
𝑓(1|1) = ( ) 𝑓(1,1) = ( ) ( ) = ,
3 3 14 2
7 7
𝑓(2|1) = ( ) 𝑓(2,1) = ( ) (0) = 0
3 3
x 0 1 2
1 1
𝑓(𝑥|1) 0
2 2
Finally,
1
𝑃(𝑋 = 0|𝑌 = 1) = 𝑓(0|1) =
2
Therefore, if it is known that 1 of the 2 pen refills selected is red, we have a probability
Statistical Independence. If f (x|y) does not depend on y, then f (x|y) = g(x) and
f(x, y) = g(x) h(y). The proof follows by substituting the equation below into the marginal
distribution of X.
𝑓(𝑥, 𝑦) = 𝑓(𝑥|𝑦)ℎ(𝑦)
MATH 403- ENGINEERING DATA ANALYSIS
That is
∞ ∞
𝑔(𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫ 𝑓(𝑥|𝑦)ℎ(𝑦)𝑑𝑦
−∞ −∞
∞
𝑔(𝑥) = 𝑓(𝑥|𝑦) ∫ ℎ(𝑦)𝑑𝑦
−∞
Now
∞
∫ ℎ(𝑦)𝑑𝑦 = 1
−∞
It should make sense to the reader that if f (x|y) does not depend on y, then of
course the outcome of the random variable Y has no impact on the outcome of the random
variable X. In other words, we say that X and Y are independent random variables. We
Let X and Y be two random variables, discrete or continuous, with joint probability
distribution f(x, y) and marginal distributions g(x) and h(y), respectively. The random
𝑓(𝑥, 𝑦) = 𝑔(𝑥)ℎ(𝑦)
thorough investigation, since it is possible to have the product of the marginal distributions
equal to the joint probability distribution for some but not all combinations of (x, y). If you
can find any point (x, y) for which f(x, y) is defined such that f(x, y) ≠ g(x) h(y), the discrete
Example 1. Show that the random variables of Example1 are not statistically independent.
Solution:
Let us consider the point (0, 1). From Table1. We find the three probabilities
3
𝑓(0,1) =
14
2
3 3 1 5
𝑔(0) = ∑ 𝑓(0, 𝑦) = + + =
28 14 28 14
𝑦=0
2
3 3 3
ℎ(1) = ∑ 𝑓(𝑥, 1) = + +0=
14 14 7
𝑥=0
Clearly,
𝑓(0,1) ≠ 𝑔(0)ℎ(1)
All the preceding definitions concerning two random variables can be generalized to the
case of n random variables. Let 𝑓(𝑥1 , 𝑥2 , … . , 𝑥𝑛 ) be the joint probability function of the
𝑔(𝑥1 ) = ∑ … ∑ 𝑓(𝑥1 , 𝑥2 , … . , 𝑥𝑛 )
𝑥2 𝑥𝑛
∞ ∞
𝑔(𝑥1 ) = ∫ … ∫ 𝑓(𝑥1 , 𝑥2 , … . , 𝑥𝑛 ) 𝑑𝑥2 𝑑𝑥3 … 𝑑𝑥𝑛
−∞ −∞
for the continuous case. We can now obtain joint marginal distributions such as 𝑔(𝑥1 , 𝑥2 ) ,
shown below:
We could consider numerous conditional distributions. For example, the joint conditional
𝑓(𝑥1 , 𝑥2 , … . , 𝑥𝑛 )
𝑓(𝑥1 , 𝑥2 , 𝑥3 |𝑥4 , 𝑥5 , … . , 𝑥𝑛 ) =
𝑔(𝑥4 , 𝑥5 , … . , 𝑥𝑛 )
𝑥4 , 𝑥5 , … . , 𝑥𝑛 .
It leads to the following definition for the mutual statistical independence of the
variables 𝑋1 , 𝑋2 , … . , 𝑋𝑛 .
distribution 𝑓(𝑥1 , 𝑥2 , … . , 𝑥𝑛 ) and marginal distribution 𝑓1 (𝑥1 ), 𝑓2 (𝑥2 )... 𝑓𝑛 (𝑥𝑛 ), respectively.
MATH 403- ENGINEERING DATA ANALYSIS
and only if
For example, Suppose that the shelf life, in years, of a certain perishable food product
function is given by
𝑒 −𝑥 , 𝑥>0
𝑓(𝑥) = {
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Let 𝑋1,𝑋2 and 𝑋3 represent the shelf lives for three of these containers selected
Since the containers were selected independently, we can assume that the random
variables 𝑋1, 𝑋2, and 𝑋3 are statistically independent, having the joint probability density
𝑓(𝑥1 , 𝑥2 , 𝑥3 ) = 𝑓(𝑥1 )𝑓(𝑥2 )𝑓(𝑥3 ) = 𝑒 −𝑥1 𝑒 −𝑥2 𝑒 −𝑥3 = 𝑒 −𝑥1 −𝑥2 −𝑥3
∞ 3 2
P (𝑋1 < 2, 1 < 𝑋2 < 3, 𝑋3 < 2) = ∫2 ∫1 ∫0 𝑒 −𝑥1 −𝑥2 −𝑥3 𝑑𝑥1 𝑑𝑥2 𝑑𝑥3
= (1 − 𝑒 −2 )(𝑒 −1 − 𝑒 −3 )𝑒 −2 = 0.0372
MATH 403- ENGINEERING DATA ANALYSIS
Practice Problem:
1. The amount of kerosene, in thousands of liters, in a tank at the beginning of any day is
a random amount Y from which a random amount X is sold during that day. Suppose that
the tank is not resupplied during the day so that x ≤ y, and assume that the joint density
2
𝑓(𝑥, 𝑦) = { 0<𝑥≤𝑦<1
0
2. Let the random variable X denote the time until a computer server connects to your
machine (in milliseconds), and let Y denote the time until the server authorizes you as a
valid user (in milliseconds). Each of these random variables measures the wait from a
common starting time and X < Y. Assume that the joint probability density function for X
and Y is in the equation below, determine the conditional probability density function for
Y given that X = x.
More than two random variables can be defined in a random experiment. Results for
multiple random variables are straightforward extensions of those for two random
variables.
A joint probability density function for the continuous random variables X1, X2 ,
If the joint probability density function of continuous random variables X1, X2 , ....Xp,
where the integral is over all points in the range of X1, X2 , ....Xp, for which Xi= xi
variables by an extension of the ideas used for two random random variables.
For example, the joint conditional probability distribution of X1, X2 ,and X3 given
(X4= x4 , X5= x5 ) is
𝑓𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 , 𝑋5 (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 )
𝑓(𝑋1 , 𝑋2 , 𝑋3 , | 𝑥4 , 𝑥5 )(𝑥1 , 𝑥2 , 𝑥3 ) = 𝑓𝑜𝑟 𝑓𝑥4 𝑥5 (𝑥4 , 𝑥5 ) > 0
𝑓𝑥4 𝑥5 (𝑥4 , 𝑥5 )
variables. Results for linear functions are important, for example, if the random variables
of X1, and X2 denote the length and width, respectively, of a manufactured part, Y = 2
X1 + X2 is a random variable that represents the perimeter of the part. In this section, we
develop results for random variables that are linear functions of random variables.
Given random variables X1, X2 , ....Xp and constants C0, C1, C2, ....Cp, the equation below is
Y = C0 + C0 X1 + C2 X2, +………+ Cp Xp
then in general,
Example 1. A semiconductor product consists of three layers. Suppose that the variances
in thickness of the first, second, and third layers are 25, 40, and 30 square nanometers,
respectively, and the layer thicknesses are independent. What is the variance of the
Solution:
Let X1, X2, X3, and X be random variables that denote the thicknesses of the respective
X = X1+ X2 + X3
Consequently, the standard deviation of thickness of the final product is 951/2 = 9.75 nm,
and this shows how the variation in each layer is propagated to the final product.
Practice Problem:
2. Given the following probability function of the random variable X, find E(X), VAR(X),
x 5 10 15
Find the expected value of the number of good components in this sample.
Suppose that X is a discrete random variable with probability distribution fX(x). Let
transformation, we mean that each value x is related to one and only one value of y = h(x)
and that each value of y is related to one and only one value of x, say, x = u(y) where u(y)
Now the random variable Y takes on the value y when X takes on the value u(y).
Suppose that X is a discrete random variable with probability distribution fX (x). Let
the equation y = h(x) can be solved uniquely for x in terms of y. Let this solution be
fY (y) = fX [u(y)]
Solution:
Because X ≥ 0, the transformation is one to one; that is, y = x2 and x =√𝑦 . Therefore,
X, so that the equation y = h(x) can be uniquely solved for x in terms of y. Let the
fY (y) = fX [u(y)] |J |
where J = u′(y) is called the Jacobian of the transformation and the absolute
value of J is used.
REFERENCES:
Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists, 9th ed.,
Pearson Education Inc., 2016
Montgomery, Douglas C., et al., Applied Statistics and Probabiliy for Engineers, 7th ed., John
Wiley & Sons (Asia) Pte Ltd, 2018
Murray, Spiegel R., et al., Probability and Statistics, 4th ed., McGraw Hill Companies Inc., 2013
https://online.stat.psu.edu/stat414/lesson/17/17.1
http://bestmaths.net/online/index.php/year-levels/year-12/year-12-topic-list/functions-random-
variables/
MATH 403- ENGINEERING DATA ANALYSIS
CHAPTER TEST
toffees, and cordials. Suppose that the weight of each box is 1 kilogram, but the
individual weights of the creams, toffees, and cordials vary from box to box. For a
randomly selected box, let X and Y represent the weights of the creams and the
toffees, respectively, and suppose that the joint density function of these variables
is
24 𝑥𝑦
𝑓(𝑥, 𝑦) = { 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 1, 𝑥 + 𝑦 ≤ 1
0,
(a) Find the probability that in a given box the cordials account for more than 1/2
of the weight.
(b) Find the marginal density for the weight of the creams.
(c) Find the probability that the weight of the toffees in a box is less than 1/8 of a
2. Given the joint density function below, find P (1 <Y < 3 | X = 1).
6−𝑥−𝑦
𝑓(𝑥, 𝑦) = { 8 0 < 𝑥 < 2, 2 < 𝑦 < 4,
0,
experiment is conducted in which 5 items are drawn randomly from the process.
MATH 403- ENGINEERING DATA ANALYSIS
Let the random variable X be the number of defectives in this sample of 5. What is
4. Consider the random variables X and Y that represent the number of vehicles
that arrive at two separate street corners during a certain 2-minute period.
These street corners are fairly close together so it is important that traffic engineers
deal with them jointly if necessary. The joint distribution of X and Y is known to be
9 1
𝑓(𝑥, 𝑦) = 16 . 4(𝑥+𝑦) for x = 0, 1, 2,... and y = 0, 1, 2,...
(a) Are the two random variables X and Y independent? Explain why or why not.
(b) What is the probability that during the time period in question less than 4
5. Three cards are drawn without replacement from the 12 face cards (jacks,
queens, and kings) of an ordinary deck of 52 playing cards. Let X be the number
Chapter 6
Sampling Distributions and Point Estimation of Parameters
Introduction
Statistical methods are used to make decisions and draw conclusions about
techniques utilize the information in a sample for drawing conclusions. This chapter
Statistical inference has one major areas which is the parameter estimation. In
practice, the engineer will use sample data to compute a number that is in some sense a
reasonable value (a good guess) of the true population mean. This number is called a
point estimate. In this chapter, we will see that procedures are available for developing
At the end of this module, it is expected that the students will be able to:
2. Calculate and explain the important rule of the normal distribution as a sampling
3. Solve and explain important properties of point estimators, including bias, variance,
population parameter. We know that before the data are collected, the observations are
considered to be random variables, say, X1, X2, …, Xn. Therefore, any function of the
For example, the sample mean X and the sample variance 𝑆 2 are statistics and
random variables. A simple way to visualize this is as follows. Suppose we take a sample
of n = 10 observations from a population and compute the sample average, getting the
observations from the same population and the resulting sample average is 10.4. The
sample average depends on the observations in the sample, which differ from sample to
sample because they are random variables. Consequently, the sample average (or any
distribution is very important and is discussed and illustrated later in the chapter.
represent the parameter of interest. We use the Greek symbol θ (theta) to represent the
parameter. The symbol θ can represent the mean μ, the variance σ2, or any parameter of
interest to us. The objective of point estimation is to select a single number based on
sample data that is the most plausible value for θ. The numerical value of a sample
MATH 403- ENGINEERING DATA ANALYSIS
statistic is used as the point estimate. In general, if X is a random variable with probability
random sample of size n from X, the statistic = h(X1, X2,…, Xn) is called a point estimator
the sample has been selected, Θ̂ takes on a particular numerical value θ̂ called the point
estimate of θ.
Point estimation is the process of using the data available to estimate the unknown value
of a parameter, when some representative statistical model has been proposed for the
unknown mean μ. Sample mean is a point estimator of the unknown population mean μ.
That is, .After the sample has been selected, the numerical value is the point
estimate of μ. Thus, if x1 = 25, x2 = 30, x3 = 29, and x4 = 31, the point estimate of μ is
25 + 30 + 29 + 31
= 4
= 28.75
Similarly, if the population variance σ2 is also unknown, a point estimator for σ 2 is the
sample variance S2, and the numerical value s2 = 6.9 calculated from the sample data is
Practice Problem:
estimate the mean and variance of X, we observe a random sample X1, X2,⋯⋯, X7. We
166.8,171.4,169.1,178.5,168.0,157.9,170.1166.8,171.4,169.1,178.5,168.0,157.9,170.1
Find the values of the sample mean, the sample variance, and the sample standard
predictions. For example, we might claim, based on the opinions of several people
interviewed on the street, that in a forthcoming election 60% of the eligible voters in the
city of Detroit favor a certain candidate. In this case, we are dealing with a random sample
of opinions from a very large finite population. As a second illustration we might state that
the average cost to build a residence in Charleston, South Carolina, is between $330,000
and $335,000, based on the estimates of 3 contractors selected at random from the 30
MATH 403- ENGINEERING DATA ANALYSIS
now building in this city. The population being sampled here is again finite but very small.
millilitres per drink. A company official who computes the mean of 40 drinks obtains =
236 millilitres and, on the basis of this value, decides that the machine is still dispensing
drinks with an average content of μ = 240 millilitres. The 40 drinks represent a sample
from the infinite population of possible drinks that will be dispensed by this machine.
Random Sample
distributed. These random variables are known as a random sample. The random
variables X1, X2, … , Xn are a random sample of size n if (a) the Xi ’s are independent
random variables and (b) every Xi has the same probability distribution.
Statistic
if X1, X2, … , Xn is a random sample of size n, the sample mean ,the sample variance
S2, and the sample standard deviation S are statistics. Because a statistic is a random
Sampling distribution
sampling distribution of a statistic depends on the distribution of the population, the size
of the samples, and the method of choosing the samples. The probability distribution of
that a random sample of size n is taken from a normal population with mean μ and
variance σ2. Now each observation in this sample, say, X1, X2, … , Xn, is a normally and
independently distributed random variable with mean μ and variance σ2. Then because
linear functions of independent, normally distributed random variables are also normally
distributed as discussed in the previous chapters, we conclude that the sample mean
𝑋1 +𝑋2 ……𝑋𝑛
=
n
μ+μ+μ……μ
μ = = μ
n
and variance
the sampling distribution of the sample mean will still be approximately normal with mean
μ and variance σ2/n if the sample n is large. This is one of the most useful theorems in
finite or infinite) with mean μ and finite variance σ2 and if is the sample mean,
−μ
𝑍= σ
√𝑛
.
Figure1. Illustration of the Central Limit Theorem (distribution of for n =1,
Figure1 illustrates how the theorem works. It shows how the distribution of
becomes closer to normal as n grows larger, beginning with the clearly nonsymmetric
remains μ for any sample size and the variance of gets smaller as n increases.
Example 1. An electrical firm manufactures light bulbs that have a length of life that is
approximately normally distributed, with mean equal to 800 hours and a standard
deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an
Solution:
= 40/√16 = 10. The desired probability is given by the area of the shaded region
755 − 800
𝑍= = −2.5
10
and therefore
Practice Problem:
ohms and a standard deviation of 10 ohms. The distribution of resistance is normal. Find
the probability that a random sample of n = 25 resistors will have an average resistance
If we have two independent populations with means μ 1 and μ2 and variances σ21
and σ22 and if 1 and 2 are the sample means of two independent random samples of
sizes n1 and n2 from these populations, then the sampling distribution of the equation
below is approximately standard normal if the conditions of the central limit theorem
apply. If the two populations are normal, the sampling distribution of Z is exactly standard
normal.
1 − 2 − (μ1 − μ2 )
𝑍=
√σ21 /𝑛1 + σ22 /𝑛2
Example 1. Two independent experiments are run in which two different types of paint
are compared. Eighteen specimens are painted using type A, and the drying time, in
hours, is recorded for each. The same is done with type B. The population standard
Assuming that the mean drying time is equal for the two types of paint, find
P( 𝐴 − 𝐵 > 1.0), where 𝐴 and 𝐵 are average drying times for samples of size
nA = nB = 18.
Solution:
μ = μ𝐴 − μ𝐴 = 0
𝐴− 𝐵
and variance
σ2𝐴 σ2𝐵 1 1 1
σ2 = + = 18 + 18 = 9
𝐴− 𝐵 n𝐴 n𝐵
MATH 403- ENGINEERING DATA ANALYSIS
1 − 2 − (μ1 − μ2 )
𝑍=
√σ21 /𝑛1 + σ22 /𝑛2
1 − (μ𝐴 − μ𝐵 ) 1−0
𝑍= =𝑍= =3
√1 √1
9 9
Practice Problem:
1. The television picture tubes of manufacturer A have a mean lifetime of 6.5 years and
a standard deviation of 0.9 year, while those of manufacturer B have a mean lifetime of
6.0 years and a standard deviation of 0.8 year. What is the probability that a random
sample of 36 tubes from manufacturer A will have a mean lifetime that is at least 1 year
more than the mean lifetime of a sample of 49 tubes from manufacturer B? Given the
following information.
MATH 403- ENGINEERING DATA ANALYSIS
. For example, the value of the statistic , computed from a sample of size n, is a point
estimate of the population parameter μ. Similarly, = x/n is a point estimate of the true
An estimator should be “close” in some sense to the true value of the unknown
is equal to θ. This is equivalent to saying that the mean of the probability distribution of
Bias of an Estimator
E( )=θ
E( )−θ
Example 1. Let X1, X2, X3, ......, Xn be a random sample. Show that the sample mean
Solution:
B( )=E( )=θ
=E( )–θ
= EXi – θ
=0
above example 1 = 𝑋1 .
B( 1) =E( 1) –θ
= EX1 – θ
=0
Practice Problem:
1. Suppose that X is a random variable with mean μ and variance σ 2. Let X1, X2, … , Xn
be a random sample of size n from the population represented by X. Show that the sample
mean and sample variance S2 are unbiased estimators of μ and σ2, respectively.
Suppose that 1 and 2 are unbiased estimators of θ. This indicates that the
distribution of each estimator is centered at the true value of zero. However, the variance
of these distributions may be different. Figure 4 illustrates the situation. Because 1 has
a smaller variance than 2, the estimator 1 is more likely to produce an estimate close
MATH 403- ENGINEERING DATA ANALYSIS
to the true value of θ. A logical principle of estimation when selecting among several
If we consider all unbiased estimators of θ, the one with the smallest variance is
If X1, X2, … , Xn is a random sample of size n from a normal distribution with mean μ
When we do not know whether an MVUE exists, we could still use a minimum
have a random sample of n observations X1, X2, … , Xn, and we wish to compare two
possible estimators for μ: the sample mean and a single observation from the sample,
say, Xi . Note that both and Xi are unbiased estimators of μ; for the sample mean, we
have V ( ) =σ2 ∕ n from previous Chapters and the variance of any observation is V (Xi)
= σ2. Because V ( ) < V (Xi) for sample sizes n ≥ 2, we would conclude that the sample
desirable to give some idea of the precision of estimation. The measure of precision
usually employed is the standard error of the estimator that has been used.
the standard error involves unknown parameters that can be estimated, substitution
Suppose that we are sampling from a normal distribution with mean μ and variance
σ2. Now the distribution of is normal with mean μ and variance σ2/n, so the standard
error of is
σ
σ =
√𝑛
If we did not know σ but substituted the sample standard deviation S into the
S
SE ( ) = =
√𝑛
Table 1. present’s standard errors for some sample statistics with its standard
error formula. Sampling distributions for these statistics, or at least their means and
standard deviations (standard errors), can often be found. Some of these, together with
Example 1. An article in the Journal of Heat Transfer (Trans. ASME, Sec. C, 96, 1974,
p. 59) described a new method of measuring the thermal conductivity of Armco iron. Using
a temperature of 100°F and a power input of 550 watts, the following 10 measurements
A point estimate of the mean thermal conductivity at 100 °F and 550 watts is the sample
mean or
The standard error of the sample mean is = σ ∕ √𝑛, and because σ is unknown, we
may replace it by the sample standard deviation s = 0.284 to obtain the estimated
standard error of as
S 0.284
SE ( ) = = = √10 = 0.0898
√𝑛
squared error of the estimator can be important. The mean squared error of an estimator
Figure 5 A biased estimator 1 that has smaller variance than the unbiased estimator 2
MSE ( ) = E ( − θ) 2
MATH 403- ENGINEERING DATA ANALYSIS
MSE ( ) = E [ −E ( )] 2 + [θ – E ( )] 2
= V ( ) + (bias) 2
That is, the mean squared error of is equal to the variance of the estimator
plus the squared bias. If is an unbiased estimator of θ, the mean squared error
estimators. Let 1 and 2 be two estimators of the parameter θ, and let MSE ( 1)
and MSE ( 2) be the mean squared errors of 1 and 2. Then the relative
efficiency of 2 to 1 is defined as
MSE( 1 )
MSE( 2 )
If this relative efficiency is less than 1, we would conclude that 1 is a more efficient
estimator of θ than 2.
REFERENCES:
Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists, 9th ed.,
Montgomery, Douglas C., et al., Applied Statistics and Probability for Engineers, 7th ed.,
Murray, Spiegel R., et al., Probability and Statistics, 4th ed., McGraw Hill Companies Inc.,
2013 https://www.probabilitycourse.com/chapter8/8_2_5_solved_probs.php
MATH 403- ENGINEERING DATA ANALYSIS
CHAPTER TEST
1. A population consists of the four numbers 3, 7, 11, 15. Consider all possible
samples of size two that can be drawn with replacement from this population.
Find
(a) The population mean, (b) the population standard deviation, (c) the mean of
the sampling distribution of means, (d) the standard deviation of the sampling
distribution of means. Verify (c) and (d) directly from (a) and (b) by use of suitable
formulas.
(a) 3 or more points, (b) 6 or more points, (c) between 2 and 5 points?
3. A normal population has a variance of 15. If samples of size 5 are drawn from
this population, what percentage can be expected to have variances (a) less
than 10, (b) more than 20, (c) between 5 and 10?
10.2, and 9.4 lb, respectively. Determine unbiased estimates of (a) the population
Find the distribution of the sample mean of a random sample of size n=40?
1
𝑓(𝑥, 𝑦) = { 2 (2𝑥 + 3𝑦), 4 ≤ 𝑥 ≤ 6
0,
Chapter 7
STATISTICAL INTERVALS
Introduction
represent an uncertainty that exists in the data because we work with samples that are
obtained from a larger population or process. Statistical intervals are staples of the quality
and validation practitioner’s statistical tool box. Statistical intervals can manifest as plus-
or-minus limits on test data, represent a margin of error in a scientific poll, or indicate the
level of confidence associated with a predicted value. This chapter will discussed a three-
part series written to help validation and understand the three most common intervals;
namely, the confidence interval, the prediction interval, and the tolerance interval. In this
At the end of this module, it is expected that the students will be able to:
A way to avoid this is to report the estimate in terms of a range of plausible values
usually 90%, 95%, or 99%, which is a measure of the reliability of the procedure. An
about the precision of estimation is conveyed by the length of the interval. A short interval
implies precise estimation. We cannot be certain that the interval contains the true,
unknown population parameter—we use only a sample from the full population to
compute the point estimate and the interval. However, the confidence interval is
constructed so that we have high confidence that it does contain the unknown population
parameter. Confidence intervals are widely used in engineering and the sciences.
The basic ideas of a confidence interval (CI) are most easily understood by initially
considering a simple situation. Suppose that we have a normal population with unknown
mean μ and known variance σ2.This is a somewhat unrealistic scenario because typically
both the mean and variance are unknown. However, in subsequent sections, we present
where 𝑍𝛼/2 is the upper 100α/2 percentage point of the standard normal distribution.
MATH 403- ENGINEERING DATA ANALYSIS
For small samples selected from non-normal populations, we cannot expect our
degree of confidence to be accurate. However, for samples of size n ≥ 30, with the shape
of the distributions not too skewed, sampling theory guarantees good results.
Example 1.ASTM Standard E23 defines standard test methods for notched bar impact
testing of metallic materials. The Charpy V-notch (CVN) technique measures impact
energy and is often used to determine whether or not a material experiences a ductile-to-
brittle transition with decreasing temperature. Ten measurements of impact energy (J) on
specimens of A238 steel cut at 60∘C are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3,
64.6, 64.8, 64.2, and 64.3. Assume that impact energy is normally distributed with σ = 1
J. We want to find a 95% CI for μ, the mean impact energy. The required quantities are
Solution:
𝜎 𝜎
- 𝑍𝛼/2 ( )≤μ≤ + 𝑍𝛼/2 ( )
√𝑛 √𝑛
1 1
64.46 -1.96( ) ≤ μ ≤ 64.46 + 1.96( )
√10 √10
63.84 ≤ μ ≤ 65.08
Based on the sample data, a range of highly plausible values for mean impact
Practice Problem:
in 36 different locations in a river is found to be 2.6 grams per milliliter. Find the 95% and
99% confidence intervals for the mean zinc concentration in the river. Assume that the
population standard deviation is 0.3 gram per milliliter. Ans. 2.47 <μ< 2.73.
This means that in using to estimate μ, the error E = | − μ| is less than or equal
𝜎
to 𝑍𝛼/2 ( ) with confidence 100(1 − α). This is shown graphically in Figure 1.
√𝑛
In situations whose sample size can be controlled, we can choose n so that we are
100(1 − α) % confident that the error in estimating μ is less than a specified bound on
𝜎
the error E. The appropriate sample size is found by choosing n such that 𝑍𝛼 ( )=E
2 √𝑛
𝑍𝛼
𝜎
n =( 2
)2
𝐸
MATH 403- ENGINEERING DATA ANALYSIS
Example 1.Consider the CVN test described in Example1 and suppose that we want to
determine how many specimens must be tested to ensure that the 95% CI on μ for
A238 steel cut at 60°C has a length of at most 1.0 J. Because the bound on error in
Solution:
𝑍𝛼
𝜎
n=( 2
)2
𝐸
(1.96)(1)
n=( )2 = 15. 37
0.5
Known
The confidence interval in Equation 8.5 gives both a lower confidence bound and
an upper confidence bound for μ. Thus, it provides a two-sided CI. It is also possible
to obtain one-sided confidence bounds for μ by setting either the lower bound l= −∞
𝜎
+ 𝑍𝛼 ( )
√𝑛
MATH 403- ENGINEERING DATA ANALYSIS
𝜎
- 𝑍𝛼 ( )≤μ
√𝑛
Example 1.The same data for impact testing from Example 1 are used to construct a
lower, one-sided 95% confidence interval for the mean impact energy. Recall that x =
Solution:
𝜎
- 𝑍𝛼 ( )≤μ
√𝑛
1
64. 46 – 1.64 ( )≤μ
√10
63.94 ≤ μ
The lower limit for the two sided interval in Example1 was 63.84. Because 𝑍𝛼 <
𝑍𝛼/2, the lower limit of a one-sided interval is always greater than the lower limit of a two-
sided interval of equal confidence. The one-sided interval does not bound μ from above
so that it still achieves 95% confidence with a slightly larger lower limit. If our interest is
only in the lower limit for μ, then the one-sided interval is preferred because it provides
equal confidence with a greater limit. Similarly, a one-sided upper limit is always less than
Practice Problem:
that the variance in reaction times to these types of stimuli is 4 sec2 and that the
distribution of reaction times is approximately normal. The average time for the subjects
is 6.2 seconds. Give an upper 95% bound for the mean reaction time. Ans: 6.858
seconds.
−μ
S
√𝑛
𝑆 𝑆
- 𝑍𝛼/2 ( )≤μ≤ + 𝑍𝛼/2 ( )
√𝑛 √𝑛
100(1 − α) %.
Example 1. An article in the 1993 volume of the Transactions of the American Fisheries
largemouth bass.
A sample of fish was selected from 53 Florida lakes, and mercury concentration in
Solution:
The required quantities are n = 53, x = 0.5250,s = 0.3486, and 𝑍0.025 = 1.96. The
approximate 95% CI on μ is
𝑆 𝑆
- 𝑍𝛼/2 ( )≤μ≤ + 𝑍𝛼/2 ( )
√𝑛 √𝑛
𝑆 𝑆
- 𝑍0.025 ( )≤μ≤ + 𝑍0.025 ( )
√𝑛 √𝑛
0.3486 0.3486
0.5250 - 1.96 ( ) ≤ μ ≤ 0.05250 + 1.96 ( )
√53 √53
0.4311 ≤ μ ≤0.6189
MATH 403- ENGINEERING DATA ANALYSIS
This interval is fairly wide because there is substantial variability in the mercury
interval.
If and S are the mean and standard deviation of a random sample from a normal
by
𝑆 𝑆
- 𝑡𝛼,𝑛−1 ( )≤μ≤ + 𝑡𝛼,𝑛−1 ( )
2 √𝑛 2 √𝑛
where 𝑡𝛼,𝑛−1 is the upper 100α/2 percentage point of the t distribution with n − 1 degrees
2
of freedom.
Adhesion Tests on Plasma Sprayed Thermal Barrier Coatings” (1989, Vol. 11(4), pp.
275–282)] describes the results of tensile adhesion tests on 22 U-700 alloy specimens.
Solution:
The sample mean is = 13.71, and the sample standard deviation is s = 3.55.
Figures 8.6 and 8.7 show a box plot and a normal probability plot of the tensile adhesion
test data, respectively. These displays provide good support for the assumption that the
(CI) is
𝑆 𝑆
- 𝑡𝛼,𝑛−1 ( )≤μ≤ + 𝑡𝛼,𝑛−1 ( )
2 √𝑛 2 √𝑛
3.55 3.55
13.71 - 2.080 ( ) ≤ μ ≤ 13.71 + 2.080 ( )
√22 √22
12.14 ≤ μ ≤ 15.28
The CI is fairly wide because there is a lot of variability in the tensile adhesion test
t - distribution
−μ
𝑇=
S
√𝑛
Distribution
are needed. When the population is modelled by a normal distribution, the tests and
intervals described in this section are applicable. The following result provides the basis
X2 Distribution
Let X1, X2,…, Xn be a random sample from a normal distribution with mean μ
(n − 1)𝑆 2
𝑋2 =
σ2
normal distribution with unknown variance σ2, then a 100(1 − α) % confidence interval
on σ2 is
(n−1)𝑆 2 (n−1)𝑆 2
( ) ≤ σ2 ≤ ( )
𝑋2𝛼 𝑋 2 1−𝛼
,𝑛−1 ,𝑛−1
2 2
where 𝑋 2 𝛼,𝑛−1 and 𝑋 2 1−𝛼,𝑛−1 are the upper and lower 100α/2 percentage points of the
2 2
interval for σ has lower and upper limits that are the square roots of the
respectively
(n−1)𝑆 2 (n−1)𝑆 2
( ) ≤ σ2 and σ2 ≤ ( )
𝑋 2 𝛼,𝑛−1 𝑋 2 1−𝛼,𝑛−1
Example 1. An automatic filling machine is used to fill bottles with liquid detergent. A
(fluid ounce). If the variance of fill volume is too large, an unacceptable proportion of
bottles will be under- or overfilled. We will assume that the fill volume is approximately
Solution:
(n−1)𝑆 2
σ2 ≤ ( )
𝑋 2 1−𝛼,𝑛−1
(20−1)0.0153
σ2≤( ) = 0.0287(fluid ounce)2
𝑋 2 0.95,19
This last expression may be converted into a confidence interval on the standard
σ =0.17
Therefore, at the 95% level of confidence, the data indicate that the process
standard deviation could be as large as 0.17 fluid ounce. The process engineer or
MATH 403- ENGINEERING DATA ANALYSIS
manager now needs to determine whether a standard deviation this large could lead to
If we have two populations with means μ1 and μ2 and variances σ21 and σ22 ,
respectively, a point estimator of the difference between μ1 and μ2 is given by the statistic
random samples, one from each population, of sizes n1 and n2, and compute 1 − 2, the
difference of the sample means. Clearly, we must consider the sampling distribution
of 1 − 2.
σ2 σ22 σ2 σ22
( 1 − 2 )- 𝑍𝛼 ( √ 𝑛 1 + ) < 𝜇1 -𝜇2 < ( 1 − 2) + 𝑍𝛼 ( √ 𝑛 1 + )
2 1 𝑛2 2 1 𝑛2
𝑋−𝑛𝑝 −𝑝
𝑍= =
√𝑛𝑝−(1−𝑝) √𝑛𝑝−(1−𝑝)
(1− ) (1− )
- 𝑍𝛼 (√ )≤ p ≤ + 𝑍𝛼 (√ )
2 𝑛 2 𝑛
where 𝑍𝛼 is the upper α/2 percentage point of the standard normal distribution.
2
100(1− α) % confident that the error is less than some specified value E. If we set
𝑝(1 − 𝑝)
E = 𝑍𝛼 √ and solve for n, the appropriate sample size is
2 𝑛
𝑍𝛼
𝜎
n=(
2
)2 p (1-p)
𝐸
MATH 403- ENGINEERING DATA ANALYSIS
respectively.
(1− ) (1− )
- 𝑍𝛼 (√ )≤ p and + 𝑍𝛼 (√ )
𝑛 𝑛
of a variable. This is a different problem than estimating the mean of that variable, so a
confidence interval is not appropriate. In this section, we show how to obtain a 100(1 − α)
A prediction interval provides bounds on one (or more) future observations from
the population. For example, a prediction interval could be used to bound a single, new
1 1
- 𝑍𝛼 𝜎 ( √1 + ) < 𝑋0 ≤ + 𝑍𝛼 𝜎 ( √1 + )
2 𝑛 2 𝑛
1 1
- 𝑡𝛼 𝑆 ( √1 + ) < 𝑋0 ≤ + 𝑡𝛼 𝑆 (√1 + )
2 𝑛 2 𝑛
where 𝑡𝛼 is the t-value with v = n − 1 degrees of freedom, leaving an area of α/2 to the
2
right.
Example 1. A meat inspector has randomly selected 30 packs of 95% lean beef. The
sample resulted in a mean of 96.2% with a sample standard deviation of 0.8%. Find a
99% prediction interval for the leanness of a new pack. Assume normality.
Solution:
1 1
- 𝑡𝛼 𝑆 ( √1 + ) < 𝑋0 ≤ + 𝑡𝛼 𝑆 (√1 + )
2 𝑛 2 𝑛
1 1
96.2 - (2.756)(0.8) √1 + ) < 𝑋0 ≤ 96.2 + ( 2.756) (0.8) √1 + )
30 30
Notice that the prediction interval is considerably longer than the CI. This is because the
observation.
Practice Problem:
Due to the decrease in interest rates, the First Citizens Bank received a lot of
amount of $257,300. Assume a population standard deviation of $25,000. For the next
customer who fills out a mortgage application, find a 95% prediction interval for the loan
amount.
MATH 403- ENGINEERING DATA ANALYSIS
might like to calculate limits that bound 95% of the viscosity values.
-ks , + ks or ± ks
where k is a tolerance interval factor found in Table I. Values are given for
γ = 90%, 95%, and 99%, and for 90%, 95%, and 99% confidence.
bounds can also be computed. The tolerance factors for these bounds are also given in
Table I.
MATH 403- ENGINEERING DATA ANALYSIS
Example 1. Consider Example 7. With the information given, find a tolerance interval that
gives two-sided 95% bounds on 90% of the distribution of packages of 95% lean
Recall from Example 7 that n = 30, the sample mean is 96.2%, and the sample
Solution:
± ks
we find that the lower and upper bounds are 94.5 and 97.9. We are 95% confident that
the above range covers the central 90% of the distribution of 95% lean beef packages.
REFERRENCES:
Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists,9th ed., Pearson
Education Inc., 2016
Montgomery, Douglas C., et al., Applied Statistics and Probability for Engineers, 7th ed., John
Wiley & Sons (Asia) Pte Ltd, 2018
Murray, Spiegel R., et al., Probability and Statistics, 4th ed., McGraw Hill Companies Inc., 2013
MATH 403- ENGINEERING DATA ANALYSIS
CHAPTER TEST
1. A random sample of size n1 = 25, taken from a normal population with a standard
2. An electrical firm manufactures light bulbs that have a length of life that is
sample of 30 bulbs has an average life of 780 hours, find a 96% confidence interval
pieces is taken, and the diameters are found to be 1.01, 0.97, 1.03, 1.04, 0.99,
0.98, 0.99, 1.01, and 1.03 centimeters. Find a 99% confidence interval for the
distribution.
these pieces is taken and the diameters are found to be 1.01, 0.97, 1.03, 1.04,
0.99, 0.98, 0.99, 1.01, and 1.03 centimeters. Use these data to calculate three
interval types and draw interpretations that illustrate the distinction between them
normal distribution. The sample mean and standard deviation for the given data
(c) Find the 99% tolerance limits that will contain 95% of the metal pieces produced
by this machine.
.
MATH 403- ENGINEERING DATA ANALYSIS
Chapter 8
TEST ON HYPOTHESIS FOR A SINGLE
SAMPLE
Introduction
sample data using a point estimate or confidence interval was discussed. In many
situations there are two competing claims about the value of a parameter, and whichever
claim is correct must be determined. This can be done by statistical inference. Inferential
statistics is the other branch of statistics which deals with the estimates of population
values called parameters and to make statements about computed statistics acceptable
to some degree of confidence. Statistical inference is the method concerned with making
determining how accurate the generalizations are. This chapter focuses on the basic
sample of data.
At the end of this module, it is expected that the students will be able to:
t-test.
MATH 403- ENGINEERING DATA ANALYSIS
population. The goal of this process is to make judgment about the difference between
the sample statistics and a hypothesized population parameter. In this process, the
researcher must define the population under study, state the hypothesis to be
investigated, give the significance level, select a sample, collect data, perform the
required test and reach a conclusion. The z test and t test are statistical tests for
hypothesis testing on means while chi-square test is used for testing the standard
deviation.
existence of relationship between the variables under study. This statement is tested for
a statement of the expectation derived from the theory under the study.
MATH 403- ENGINEERING DATA ANALYSIS
Reject Ho Do no reject Ho
A type I error occurs if one rejects the null hypothesis when it is true. It is also
referred to as significance level and denoted by the Greek symbol alpha (). The
common values of are 1%, 5% and 10%. A type II error occurs if one does not reject
the null hypothesis when it is false. It is denoted by a Greek symbol beta ().
The level of significance is the maximum probability of committing a type I error. That
is, P (type I error) = . Generally, statisticians agree on using three arbitrary significance
levels: 0.10, 0.05 and 0.01 level. That is, if the null hypothesis is rejected, the probability
of a type I error will be 10%, 5% or 1% and the probability of correct decision will be 90%,
95% or 99%, depending on which level of significance is used. The values of correct
decision is the confidence interval which represents the chance of accepting the null
In order to state the hypothesis correctly, the researcher must translate correctly
the claim into mathematical symbols. There are three possible sets of statistical
hypotheses.
In hypotheses testing of a discrete test statistic, the critical region may be arbitrarily
chosen. If α is too large, it can be reduced by making an adjustment in the critical value.
It may be necessary to increase the sample size to offset the decrease that occurs
automatically in the power of the test. In statistical analysis, it had become customary to
choose a significance level of 0.10, 0.05 or 0.01 and the critical region is selected
accordingly in which the rejection or non-rejection of the null hypothesis H0 would depend
on. For example, if the test is two tailed and 𝛼 is set at the 0.05 level of significance and
the test statistic involves, say, the standard normal distribution, then a z-value is observed
from the data and the critical region is z > 1.96 or z < −1.96 where the value 1.96 is found
as z0.025 in the table of Areas Under the Normal Curve. A value of z in the critical region
prompts the statement “The value of the test statistic is significant,” which we can then
MATH 403- ENGINEERING DATA ANALYSIS
translate into the user’s language. For example, if the hypothesis is given by H 0: μ = 12,
H1: μ 12, one might say, “The mean differs significantly from the value 12.”
The philosophy that the maximum risk of making a type I error should be controlled
is he root of the pre-selection of a significance level. However, this approach does not
account for values of test statistics that are “close” to the critical region. Suppose, for
example, in the illustration with H0: μ = 12 versus H1: μ 12, a value of z = 1.84 is
observed; strictly speaking, with = 0.05, the value is not significant. But the risk of
committing a type I error if one rejects H0 in this case could hardly be considered severe.
In fact, in a two-tailed scenario, one can quantify this risk as P = 2P (Z > 1.84 when μ =
information to the user although the evidence against H 0 is not as strong as that which
would result from rejection at an = 0.05 level. As a result, the P-value approach has
of a probability, to a mere “reject” or “do not reject” conclusion. The P-value also gives an
important information when the z-value falls well into the ordinary critical region. For
example, if z is 2.75, it is informative for the user to observe that P = 2(0.0030) = 0.0060,
and thus the z-value is significant at a level considerably less than 0.05. It is important to
know that under the condition of H0, a value of z = 2.75 is an extremely rare event. That
is, a value at least that large in magnitude would only occur 60 times in 10,000
experiments.
MATH 403- ENGINEERING DATA ANALYSIS
A P-value is the lowest level of significance at which the observed value of the test
statistic is significant. It is the smallest level of that would lead to rejection of the Ho with
The following are the steps in hypothesis testing using the fixed probability of Type I
Error approach.
2. Determine the level of significance and the direction of test. The direction of test
will be based on whether the alternative hypothesis is stated as left or right tailed
4. Write the decision rule expressing on how to accept or reject the null hypothesis.
5. Compute the test statistic and compare with the critical value. The test statistic
6. State the decision based on the resulting computed value when compared to the
critical value.
If you will be testing the hypothesis using Significant Testing or the P-value approach,
4. Compute the P-value based on the computed value of the test statistic.
5. State the decision based on the resulting P-value and knowledge of the scientific
system.
Following the steps in hypothesis testing for only single mean, the hypothesized
Ho: µ = µo
H1 : µ µo
H1 : µ > µo
H1: µ < µo
The decision rule is stated as follows: reject the null hypothesis if the absolute value of
the test statistic exceeds the critical value. Otherwise, do not reject the null hypothesis.
In order to draw inference on a mean in one-population case assuming that the entries
are normally distributed and the variance is known, Z-test is used. It can be used when
the sample size is equal or greater than 30 (n 30). The Z-statistic, Zc, is the test statistic
MATH 403- ENGINEERING DATA ANALYSIS
used in order to lead for the rejection of null hypothesis in favor of the alternative
𝑋̅ − 𝜇𝑜
𝑧𝑐 =
𝜎/√𝑛
Where 𝑋̅ the computed mean is in the gathered data, 𝜇𝑜 is the hypothesized mean, 𝜎 is
the population standard deviation which is known or given and n is the sample size. The
critical value is obtained using the z-tabular value. For a two-tailed test, the value of 1-/2
written symbolically as z/2 is considered. Otherwise, for one-tailed test the value of 1-
written as z is written.
Figure 1. The Normal Distribution or Z- Distribution for Testing the Hypothesis Ho: = o
with critical values for (a) H1: o, (b) > o, (c) < o
Professor X shows that the average grade in the midterm examination is 85%. Professor
X claims that the average grade of the students in the midterm is at least 80% with a
standard deviation of 16%. Is there an evidence to say that the claim is correct at 5% level
of significance?
Solution:
1. H0 : µ = 80%
H1 : µ > 80%
MATH 403- ENGINEERING DATA ANALYSIS
𝑋̅ −𝜇𝑜
3. 𝑧𝑐 = 𝜎/√𝑛
𝑋̅ − 𝜇𝑜
𝑧𝑐 = 𝜎
√𝑛
85 − 80
=
16
√100
= 3.125
6. Reject H0 since 3.125 is greater than 1.645
Using the P-value approach, the P-value corresponding to z = 3.125 is 0.0009 using the
table for Areas Under the Normal Curve. This results to an evidence stronger than the
Example 2. A manufacturer of solar lamp claims that the mean useful life of their new
product is 8 months with a standard deviation of 0.5 month. To test this clam, a random
sample of 50 solar lamps were tested and found to have a mean life of 7.8 months. Test
the hypothesis that = 8 months against the alternative hypothesis that 8 months
Solution:
1. H0 : µ = 8 months
H1 : µ 8 months
𝑋̅ −𝜇𝑜
3. 𝑧𝑐 = 𝜎/√𝑛
4. Critical region: z < -2.575 and z > 2.575. Reject H0 if -2.575 > zc > 2.575
𝑋̅ − 𝜇𝑜
𝑧𝑐 = 𝜎
√𝑛
7.8 − 8
=
0.5
√50
7. Therefore, the mean useful life of the new product is not equal to 8 months. In fact
Using the P-value approach and considering that this is a two-tailed test, the P-value is
twice as the area to the left of z = -2.83. Using the table for Areas Under the Normal Curve,
distributed but the variance is unknown and the sample size is less than 30, t-test is used.
The test statistic used is the t-statistic, tc, which is computed as follows:
𝑋̅ − 𝜇𝑜
𝑡𝑐 =
𝑠/√𝑛
MATH 403- ENGINEERING DATA ANALYSIS
where 𝑋̅ the computed mean is in the gathered data, 𝜇𝑜 is the hypothesized mean, s is
the sample standard deviation and n is the sample size. The critical value is obtained
using the t-tabular value. For a two-sided test, critical value is obtained at /2 and at a
degree of freedom (d.f.) equals to (n-1), written as t/2 (n-1). Otherwise, for one-sided test,
Figure 2. T- Distribution for Testing the Hypothesis Ho: = o with critical values for
(a) H1: o, (b) > o, (c) < o
incoming freshmen. Those who got scores equal or higher than the set passing are
accepted in the College. The average score of the incoming freshmen was 80% before
exam was suspended for two years and it is thought that the quality of the first year
students had diminished. However, with the vision, mission, goals and objectives of the
University and the College towards quality education, the Dean wants to determine if the
diminished so a small random sample of 15 freshmen students and administers the same
entrance exam. The average score is found to be 83% with a standard deviation of 5%.
Solution:
1. H0 : µ = 80%
H1 : µ 80%
𝑋̅ −𝜇𝑜
3. 𝑡𝑐 = 𝑠/√𝑛
4. Critical region: t = 2.977. Reject H0 if tc is less than -2.977 or greater than 2.977
This is obtained from the table for Critical Values of the t-distribution using /2 = 0.005
𝑋̅ − 𝜇𝑜
𝑡𝑐 = 𝑠
√𝑛
83 − 80
=
5
√15
= 2.32
6. Do not reject H0 since 2.32 is less than 2.977 but greater than -2.977
significance.
The P-value corresponding to 2.32 is 0.036 or 3.6%. Since this is a two-tailed test, then
The chi-square distribution will be used to test a claim about a single variance or
standard deviation. The formula for the Chi-square test for a single variance is given by:
(𝑛 − 1)𝑠 2
𝜒2 =
𝜎2
where n is the sample size, 𝑠 2 is the sample variance and 𝜎 2 is the population variance
with the degrees of freedom equal to (n -1). There are three assumptions for the Chi-
square test: the sample must be randomly selected from the population, the population
must be normally distributed for the variable under study, and the observations must be
Figure 3. Chi-Squared Distribution for Testing the Hypothesis Ho: 2 = o2 with critical values for
(a) H1: 2 o2, (b) 2 > o2, (c) 2 < o2
Example1. A company claims that the variance of the sugar content of its ice cream is
measured. The variance of the sample is found to be 36. At 10% level of significance, is
Solution:
1. H0 : 2 = 25 mg/oz
MATH 403- ENGINEERING DATA ANALYSIS
H1 : 2 25 mg/oz
(𝑛−1)𝑠2
3. 𝜒 2 = 𝜎2
4. Critical region: 𝜒 2 < 10.117 and 𝜒 2 > 30.144 . Reject H0 if 𝜒 2 is less than 10.117
or greater than 30.144. This is obtained from the table for Critical Values of the
2
(𝑛 − 1)𝑠 2
𝜒 =
𝜎2
(19)(36)
=
25
= 27.36
7. Therefore, the company claim that the sugar content is equal to 25 mg/oz is correct
binomial experiment equals some specified value. That is, the null hypothesis H o that p =
po, where p is the parameter of the binomial distribution is tested. The alternative
𝑝 < 𝑝𝑜 , 𝑝 > 𝑝𝑜 or 𝑝 ≠ 𝑝𝑜
MATH 403- ENGINEERING DATA ANALYSIS
1. H0: p = po
value
Example1. A home developer claims that solar panels are installed in 65% of all homes
being constructed today in a certain subdivision. Would you agree with this claim if a
random survey of new homes in this subdivision shows that 8 out of 15 had solar panels
Solution:
1. H0 : p = 0.65
H1 : p 0.65
4. Computations: x = 8 and npo = (15) (0.65) = 9.75. Using the table for Binomial
= 2 ∑ 𝑏(𝑥; 15,0.65)
𝑥=0
= 0.5213
5. Do not reject H0 and conclude that there is no enough evidence to doubt the claim
For large n, approximation is required. When the hypothesized value po is very close to 0
or 1, the Poisson distribution with parameter µ = npo may be used. However, the normal-
curve approximation, with parameters µ = npo and 2 = npoqo, is usually preferred for large
n and is very accurate as long as po is not extremely close to 0 or 1. Using the normal
𝑥 − 𝑛𝑝𝑜
𝑧=
√𝑛𝑝𝑜 𝑞𝑜
which is a value of the standard normal variable Z. Hence, for a two-tailed test at the -
level of significance, the critical region is z < -z/2 and z > z/2. For one-sided alternative
p < po, the critical region is z < -z and for the alternative p > po, the critical region is z >
z.
The company is said to demonstrate capability to the customers if the process produces
defective items not exceeding to 5%. To determine this, a random sample of 200
microcontrollers were tested and found out that there are four defective items. Will you
agree that the company demonstrate process capability at 0.05 level of significance? Use
P-value approach.
MATH 403- ENGINEERING DATA ANALYSIS
Solution:
1. H0 : p = 0.05
𝑥 − 𝑛𝑝𝑜
𝑧=
√𝑛𝑝𝑜 𝑞𝑜
4 − 200(0.05)
=
√200(0.05)(0.95)
= −1.95
4. The P-value from the Table for Areas Under the Normal Curve, P(z < -1.95) =
0.0256.
REFERENCES:
Garcia, George A. Fundamental Concepts and Methods in Statistics, Manila: University of Sto.
Tomas Publishing House, 2004
Montgomery, Douglas C., et al., Applied Statistics and Probability for Engineers, 7th ed., John
Wiley & Sons (Asia) Pte Ltd, 2018
Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists, 9th ed.,
Pearson Education Inc., 2016
MATH 403- ENGINEERING DATA ANALYSIS
CHAPTER TEST
1. A company producing lubricating oil claims that the average content of the
containers is 20 liters. Test this claim if a random sample of ten containers are 20.4,
19.4, 20.2, 20.6, 20.2, 19.6, 19.8, 20.8, 20.6 and 19.6 liters. Assume normal
2. It is claimed that personal vehicle is driven 25,000 kilometers per year. Would
you agree with this claim if a random sample of 100 vehicle owners were asked to
keep the records of their travel and showed that an average of 28,500 kilometers
3. A marketing expert for mobile operating system believes that 40% of the users
prefer android. If 9 out of 20 choose android over IOS, what can you conclude about
normally distributed with a variance of 0.06 liter. Test this hypothesis against the
alternative that the variance is not equal to 0.06 liter. Use 0.01 level of significance.
MATH 403- ENGINEERING DATA ANALYSIS
Chapter 9
STATISTICAL INFERENCE OF TWO SAMPLES
Introduction
proportion for single sample. In this chapter, statistical inference of two samples
At the end of this module, it is expected that the students will be able to:
1. Test hypotheses on the difference in means of two normal distributions using either a
Z-test or a t-test.
Known
engineering students enrolled in Statistics. To know this, take a sample from each class,
specify the level of significance and test the hypothesis on the differences of the means
and assume that the performance of the two sections is being compared. The null
There is a significant difference in the performance of the two classes. Writing this in
The performance of the first class is better (or poorer) than the second class. This is a
one-tailed test, either right or left-tailed. The inequality statement is, H1: 1 > 2 or H1: 1
< 2.
For large samples (n 30) and when there are two independent random samples
of size n1 and n2, respectively, which are drawn from two populations with means µ1 and
µ2 and variances 12 and 22 and the random variable is normally distributed, the Z-
̅̅̅1 − 𝑋
(𝑋 ̅̅̅2 ) − (𝜇1 − 𝜇2 )
𝑍=
𝜎1 2 𝜎2 2
√
𝑛1 + 𝑛2
Ho = 𝜇1 − 𝜇2 = do
̅̅̅1 − 𝑋
(𝑋 ̅̅̅2 ) − 𝑑𝑜
𝑍=
𝜎1 2 𝜎2 2
√
𝑛1 + 𝑛2
If the population variances are not known, the sample standard deviations (s 1 and s2)
yielding corn. Based on past studies, the difference in yield is significant. To know if
there is really significant difference, the Director of the Bureau decided to conduct an
experiment. Forty hectares of the first variety and thirty hectares of the second variety
are planted and are grown in the same laboratory conditions. After harvesting, yield are
250 sacks for 1st variety with a standard deviation of 20 sacks and 240 for the 2 nd
variety with a standard deviation of 15 sacks. Is there a significant difference in the yield
Solution:
1. Ho: 1 = 2
H1: 1 2.
̅̅̅̅
(𝑋 ̅̅̅̅
1 −𝑋 2)
3. 𝑍 = , since known are sample variances
𝑠 2 𝑠 2
√ 1 + 2
𝑛1 𝑛2
4. Critical region: Z = 2.575. Reject Ho if Z is less than -2.575 or greater than 2.575.
5. Computing for Z:
̅̅̅1 − 𝑋
(𝑋 ̅̅̅2 )
𝑍=
𝑠1 2 𝑠2 2
√
𝑛1 + 𝑛2
250 − 240
=
2 2
√(20) + (15)
40 30
= 2.39
corn.
Unknown
For small samples ( n < 30): If the variance is unknown and they are assumed to be equal,
the test statistic for the pooled t-test (often called the two-sample t-test) is used. It is given
by:
(𝑥
̅̅̅1 − 𝑥
̅̅̅)
2 − 𝑑𝑜
𝑡=
1 1
𝑠𝑝 √𝑛 + 𝑛
1 2
2
𝑠1 2 (𝑛1 − 1) + 𝑠2 2 (𝑛2 − 1)
𝑠𝑝 =
𝑛1 + 𝑛2 − 2
When the variance of the two normal population are unknown and are not equal, the
̅̅̅1 − 𝑋
(𝑋 ̅̅̅2 ) − 𝑑𝑜
𝑡′ =
𝑠1 2 𝑠2 2
√
𝑛1 + 𝑛2
𝑠 2 𝑠 2
( 𝑛1 + 𝑛2 )2
1 2
𝑣= 2 2
𝑠 2 𝑠 2
[( 𝑛1 ) /(𝑛1 − 1)] + [( 𝑛2 ) /(𝑛2 − 1)]
1 2
MATH 403- ENGINEERING DATA ANALYSIS
materials. Twelve pieces of material A were tested by exposing each piece to a Brinell
Hardness Tester. Ten pieces of material B were also tested in the same machine. In each
case the harness was determined and recorded. The samples of material A gave an
of significance that the hardness of material A exceeds that of material B by more than 2
BHN?
Solution:
Let 1 and 2 be the population means of the hardness of Material A and Material B,
respectively. The population variances are unknown and let us first assume that they are
8. H0 : 1 - 2 = 2
H1 : 1 - 2 > 2
(𝑥
̅̅̅1̅−𝑥
̅̅̅2̅)−𝑑𝑜
10. 𝑡 = 1 1
𝑠𝑝 √ +
𝑛1 𝑛2
11. Critical region: t > 1.725. Reject H0 if zc is greater than 1.725. (This is obtained from
the Table for the Critical Values of the t-Distribution at degrees of freedom of =
(𝑥
̅̅̅1 − 𝑥
̅̅̅)
2 − 𝑑𝑜
𝑡𝑐 =
1 1
𝑠𝑝 √𝑛 + 𝑛
1 2
2
𝑠1 2 (𝑛1 − 1) + 𝑠2 2 (𝑛2 − 1)
𝑠𝑝 =
𝑛1 + 𝑛2 − 2
(11)(16) + (9)(25)
𝑠𝑝 = √
12 + 10 − 2
= 4.478
(85 − 81) − 2
𝑡𝑐 =
1 1
4.478√12 + 10
= 1.04
14. Therefore, we are unable to agree that the hardness of material A exceeds that of
Using the P-value approach, the P-value corresponding to t > 1.04 is 0.16 which is higher
than 0.05.
Now, consider the problem of testing the equality of the variances 12 = 22 of two
populations. The null hypothesis Ho that 12 = 22 against one of the usual alternatives
MATH 403- ENGINEERING DATA ANALYSIS
12 < 22, 12 > 22 or 12 22. For independent random samples of size n1 and n2,
respectively, from the two populations, the f-value for testing 12 = 22 is the ratio
𝑠1 2
𝑓= 2
𝑠2
where 𝑠1 2 and 𝑠2 2 are the variances computed from the two samples. If the two
populations are approximately normally distributed and the null hypothesis is true, the
Therefore, the critical regions corresponding to the one-sided alternatives 12 < 22 ad 12
> 22 are, respectively, f < f1- (𝑣1 , 𝑣2 ) and f > f1- (𝑣1 , 𝑣2 ). For the two-sided alternative 12
22, the critical region is < f1-/2 (𝑣1 , 𝑣2 ) and < f/2 (𝑣1 , 𝑣2 ).
Example1. In testing for the difference in the hardness of the two materials in the previous
example, the variances of the two unknown population are assumed to be equal. Is this
Solution:
Let 12 and 12 be the population variances for the hardness of material A and B,
respectively.
1. H0 : 12 = 12
H1 : 12 12
𝑠 2
3. 𝑓 = 𝑠1 2
2
MATH 403- ENGINEERING DATA ANALYSIS
4. Critical region: 0.34 > f > 3.11. Reject H0 if fc is less than 0.34 or greater than 3.11.
(This is obtained from the Table for the Critical Values of the F-Distribution). Take
1
𝑓1−𝛼 (𝜐1 , 𝜐2 ) =
𝑓𝛼 (𝜐2 , 𝜐1 )
1
𝑓0.95 (11,9) =
𝑓0.05 (9,11)
1
=
2.90
= 0.34
𝑠1 2
𝑓=
𝑠2 2
16
=
25
= 0.64
In testing the null hypothesis that the two proportions, or binomial parameters, are
equal, the hypothesis p1 = p2 against the alternatives p1 < p2, p1 > p2 or p1 p2 are tested.
samples of size n1 and n2 are selected at random from two binomial populations and the
In the construction of confidence intervals for p1 and p2, for n1 and n2 sufficiently large,
𝜇𝑃̂1 −𝑃̂2 = 𝑝1 − 𝑝2
and variance
𝑝1 𝑞1 𝑝2 𝑞2
𝜎 2 𝑃̂1 −𝑃̂2 = +
𝑛1 𝑛2
Therefore, acceptance and critical regions can be established by using the standard
normal variable
(𝑃̂1 − 𝑃
̂2 )– (𝑝1 − 𝑝2 )
𝑍=
𝑝1 𝑞1 𝑝2 𝑞2
√ 𝑛 + 𝑛
1 2
(𝑃̂1 − 𝑃
̂2 )
𝑍=
1 1
√𝑝𝑞 [(
𝑛1 ) + (𝑛2 )]
𝑥1 + 𝑥2
𝑝̂ =
𝑛1 + 𝑛2
Where 𝑥1 and 𝑥2 are the number of successes in each of the two samples. Substituting 𝑝̂
for p and 𝑞̂ = 1 − 𝑝̂ , for the z-value for testing 𝑝1 = 𝑝2 is determined from the formula
(𝑝 ̂2 )
̂1 − 𝑝
𝑧=
1 1
√𝑝̂ 𝑞̂ [( ) + (
𝑛1 𝑛2 )]
Hence, for the alternative 𝑝1 ≠ 𝑝2 at the -level of significance, the critical region is z < -
z/2 and z > z/2. For one-sided alternative 𝑝1 < 𝑝2 , the critical region is z < -z and for the
the residents of a city and the surrounding barangays. Many residents in the barangays
feel that the proposal will pass because of the large proportion of city voters who favor
proportion of city voters and barangay voters favoring the proposal. If 180 of 300 city
voters favor the proposal and 280 of 500 barangay residents favor it, would you agree
that the proportion of city voters favoring the proposal is higher than the proportion of
Solution:
1. H0 : 𝑝1 = 𝑝2
H1: 𝑝1 > 𝑝2
(𝑝
̂−𝑝
1 ̂2)
3. 𝑧 = 1 1
√𝑝̂𝑞̂[(𝑛 )+(𝑛 )]
1 2
(𝑝 ̂2 )
̂1 − 𝑝
𝑧=
1 1
√𝑝̂ 𝑞̂ [( ) + ( )]
𝑛1 𝑛2
𝑥1 180
̂
𝑝1 = = = 0.60
𝑛1 300
𝑥2 280
𝑝
̂2 = = = 0.56
𝑛2 500
180 + 280 46
𝑝̂ = = = 0.575
300 + 500 80
𝑞̂ = 1 − 0.5725 = 0.425
(0.60 − 0.56 )
𝑧=
√(0.575)(0.425) [( 1 ) + ( 1 )]
300 500
𝑧 = 1.108
7. Therefore, do not agree that the proportion of the city voters in favor of the
construction of the cell site tower is higher that the proportion of the barangay
voters.
REFERENCES:
Garcia, George A. Fundamental Concepts and Methods in Statistics, Manila: University of Sto.
Tomas Publishing House, 2004
Montgomery, Douglas C., et al., Applied Statistics and Probability for Engineers, 7th ed., John
Wiley & Sons (Asia) Pte Ltd, 2018
Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists, 9th ed.,
Pearson Education Inc., 2016
MATH 403- ENGINEERING DATA ANALYSIS
CHAPTER TEST
1. Professor X teaches the course Engineering Data Analysis (EDA) using the
conventional method in one of his classes. He then began to teach the course
using computers and statistical software in the second class. Professor X gives the
same examinations to these two classes. It was observed that the students who
are taught using computers and statistical software tend to get higher scores but
significance. From the final exam results, he takes a random sample for 15
students from the first class and 10 from the second class. He gets the following
results: for the class using conventional method: mean of 84 and standard
deviation of 8, while for the second class using computer and statistical software:
mean of 92 and standard deviation of 5. As a student of EDA will you agree with
Professor X?
2. A wire and cable company claims that the average tensile strength of cable A
exceeds that of cable B by at least 15 kilograms. To test his claim, 100 pieces of
each type of cable are tested under similar conditions. Cable A has an average
tensile strength of 87.6 kilograms with a standard deviation of 6.82 kilograms, while
cable B has an average tensile strength of 78.8 kilograms with a standard deviation
of 5.66 kilograms. Test the manufacturer’s claim using a 0.01 level of significance.
MATH 403- ENGINEERING DATA ANALYSIS
customers who are normally the young ladies (teenagers) and women to its new
product. There are 300 teenagers and 250 women who are randomly selected.
Among the teenagers, 100 affirms that they will buy the product while 65 women
said that they will buy the new product. With these information, is there a significant
difference in the level of acceptance of the new product between the teenagers
the same problem about the hardness of two materials, conduct hypothesis testing
but this time assuming that the variances are not equal.
MATH 403- ENGINEERING DATA ANALYSIS
Chapter 10
SIMPLE LINEAR REGRESSION AND
CORRELATION
Introduction
describe the nature of the relationship, regression is to be used. There are two types of
relationships: simple, when there are two variables under study, and multiple, when there
are many variables under study. Simple relationships can be further classified as positive
the same time. A negative relationship exists when one variable increases while the other
discussed in this chapter. In includes empirical models using linear regression, its
estimation using the least-square approach, hypothesis testing t-test and analysis of
variance (ANOVA), prediction of future observation using the model, determination of the
adequacy of the model using residual analysis and coefficient determination and the
correlation model.
MATH 403- ENGINEERING DATA ANALYSIS
At the end of this module, it is expected that the students will be able to:
Approach.
5. Determine the adequacy of the regression model using residual analysis and
coefficient determination.
Many in engineering and the sciences problems involve analysis of the relationship
between variables. The pressure and temperature of a gas in a container, the velocity
and the area of the channel, and the displacement and velocity are related to each other.
particle at time t = 0 and v be the velocity, then the displacement at any time t is
variables is not deterministic. For example, the fuel usage of a car (y) and its weight x, or
the electrical energy consumption of a house (y) and the size of the house x, in square
MATH 403- ENGINEERING DATA ANALYSIS
feet, y and x are related but the relationships are not deterministic. This means that the
value of y (fuel usage, energy consumption) cannot be predicted perfectly from the
corresponding value of x. It is possible for different cars to have different fuel usage even
if they have the same weight, and it is possible for different houses to consume different
Regression analysis is the collection of statistical tools that are used to model and
most widely used statistical tools because these types of problems occur so frequently in
In this chapter, only one independent variable x will be considered and the
relationship with the response y is assumed to be linear. This may seem to be a simple
scenario, but many practical problems fall into this assumption. For example, in a
chemical process, suppose that the yield of the product is related to the operating
can be used. It can also be used for process optimization or for process control purposes.
To illustrate, consider the data presented in the table below. This shows the purity
A scatter plot diagram of the data in the table above is presented in the next figure.
A scatter plot is a graph of the ordered pair (x, y) of numbers consisting of the independent
variable x, and the dependent variable y. The independent variable, the variable that can
plotted on the vertical axis, is the variable that cannot be controlled or manipulated. The
purpose of this graph is to determine the nature of the relationship between the variables
Figure 4. Scatter plot diagram showing the purity of oxygen and percentage of
hydrocarbon in a distillation unit
It can be seen from the scatter plot diagram that there is no simple curve that will
pass exactly through all the given data points. But there is a strong indication that these
data points lie scattered randomly around a straight line. Therefore, it is probably
reasonable to assume that a straight-line relationship exist between the mean of the purity
of oxygen (y) and the percentage of hydrocarbon present (x). That is𝐸(𝑦|𝑥) = 𝜇𝑦|𝑥 = 𝑎 +
𝑏𝑥, where a and b are the intercept and slope, respectively. They are called regression
coefficients. Although the mean of y is a linear function of x, the actual observed value y
does not fall exactly on a straight line. In order to generalize this to a probabilistic linear
model, it is necessary to assume that the expected value of y is a linear function of x but
for a fixed value of x, the actual value of y is determined by the mean value function of
the linear model with the addition of a random error term. That is: 𝑦 = 𝑎 + 𝑏𝑥 + 𝜀, where 𝜀
is the random error term. The equation has only one independent variable or regressor
MATH 403- ENGINEERING DATA ANALYSIS
and the model is called the simple linear regression model. There are times that a model
like this arises from a theoretical relationship. There is no theoretical knowledge of the
relationship between x and y and the choice of the model will be based on inspection of
a scatter plot diagram, such as the example above. The regression model is then thought
of as an empirical model. Figure 2 shows the scatter plot diagram with added trend line
with an equation of 𝑦 = 75 + 15𝑥. This is obtained by using Excel sheet, plotting the data
points in scatter diagram and add a trend line displaying the equation. The slope and y-
intercept are rounded off to integers. With this model we can determine the value of y for
A simple linear regression has only one dependent or response variable (y) and one
independent, regressor or predictor variable (x). Suppose that the value of y at each value
of x is a random variable and that the true relationship between them is a straight line. As
mentioned above, the expected value of y for each value of x is 𝐸(𝑦|𝑥) = 𝜇𝑦|𝑥 = 𝑎 + 𝑏𝑥
where a and b are the intercept and slope, respectively, called regression coefficients.
𝒚 = 𝒂 + 𝒃𝒙 + 𝜺
where 𝜀 is the random error term with mean zero and unknown variance. It is also
assumed that the random errors corresponding to different observations are uncorrelated
random variables. Suppose that we have n pairs of observations ( x 1, y1), (x2, y2) ... , (xn,
yn) as in the data presented in Table 1 and in the scatter plot in Figure 1. These data are
to be used for the estimated regression line. The estimates of a and b should result to a
line that is the “best fit” to the given data. Karl Gauss (1777–1855), a German scientist,
proposed estimating the parameters a and b to minimize the sum of the squares of the
This criterion for estimating the regression coefficients is called the method of least
The least square estimates of the intercept and slope of the linear regression model
are:
𝑆𝑆𝑥𝑦
𝑏̂ =
𝑆𝑆𝑥𝑥
𝑎̂ = 𝑦̅ − 𝑏̂𝑥̅
where:
𝑛 2
(∑𝑛𝑖 𝑥𝑖 )
𝑆𝑆𝑥𝑥 = ∑ 𝑥𝑖2 −
𝑛
𝑖
𝑛
(∑𝑛𝑖 𝑥𝑖 )(∑𝑛𝑖 𝑦𝑖 )
𝑆𝑆𝑥𝑦 = ∑ 𝑥𝑖 𝑦𝑖 −
𝑛
𝑖
∑𝑛𝑖 𝑦𝑖
𝑦̅ =
𝑛
𝑛
∑𝑖 𝑥𝑖
𝑥̅ =
𝑛
𝑦̂ = 𝑎̂ + 𝑏̂ 𝑥
𝑦𝑖 = 𝑎̂ + 𝑏̂𝑥𝑖 + 𝑒𝑖 𝑖 = 1,2, … , 𝑛
where 𝑒𝑖 is called the residual and computed as 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 . This describes the error in
the fit of the model and the 𝑖 𝑡ℎ observation𝑦𝑖 . In section 10.6 the residuals will be used to
Example1.
MATH 403- ENGINEERING DATA ANALYSIS
n 20
∑ 𝒙𝒊 23.92
̅
𝒙 1.1960
∑ 𝒚𝒊 1,843.21
̅
𝒚 92.1605
∑ 𝒙𝟐𝒊 29.2892
∑ 𝒚𝟐𝒊 170,044.5321
∑ 𝒙 𝒊 𝒚𝒊 2,214.6566
Solution:
𝑛 2
(∑𝑛𝑖 𝑥𝑖 )
𝑆𝑆𝑥𝑥 = ∑ 𝑥𝑖2 −
𝑛
𝑖
(23.92)2
= 29.2892 −
20
= 0.68088
𝑛
(∑𝑛𝑖 𝑥𝑖 )(∑𝑛𝑖 𝑦𝑖 )
𝑆𝑆𝑥𝑦 = ∑ 𝑥𝑖 𝑦𝑖 −
𝑛
𝑖
(23.92)(1,843.21)
= 2,214.6566 −
20
= 10.17744
MATH 403- ENGINEERING DATA ANALYSIS
𝑆𝑆𝑥𝑦
𝑏̂ =
𝑆𝑆𝑥𝑥
10.17744
=
0.68088
= 14.94748
𝑎̂ = 𝑦̅ − 𝑏̂𝑥̅
= 92.1605 − (14.94748)(1.196)
= 74.28331
Therefore:
𝑦̂ = 𝑎̂ + 𝑏̂ 𝑥
𝑦̂ = 74.283 + 14.947𝑥
The residuals 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 are used to obtain an estimate of the variance of the error term,
𝜎 2 . The sum of the squares of the residuals is called the error sum of squares is
𝑛 𝑛
The expected value of the error sum of squares is 𝐸(𝑆𝑆𝐸) = (𝑛 − 2)𝜎 2 . Therefore the
𝑆𝑆𝐸
𝜎̂ 2 =
𝑛−2
𝑆𝑆𝐸 = 21.2498
𝜎 2 = 1.1805
MATH 403- ENGINEERING DATA ANALYSIS
between variables and the strength of the relationship. Statisticians use a measure called
correlation coefficient. This correlation coefficient measures how closely the points in a
scatter diagram are spread around a line. The symbol for the sample correlation
coefficient is r. The symbol for the population coefficient is the Greek letter, rho ().
𝑆𝑆𝑥𝑦
𝑟=
√𝑆𝑆𝑥𝑥 𝑆𝑆𝑦𝑦
where
2
(∑ 𝑥)
2
𝑆𝑆𝑥𝑥 = ∑𝑥 −
𝑛
(∑ 𝑦)2
𝑆𝑆𝑦𝑦 = ∑ 𝑦 2 −
𝑛
(∑ 𝑥)(∑ 𝑦)
𝑆𝑆𝑥𝑦 = ∑ 𝑥𝑦 −
𝑛
𝑆𝑆𝑥𝑦
𝑟=
√𝑆𝑆𝑥𝑥 𝑆𝑆𝑦𝑦
2
(∑ 𝑥)
2
𝑆𝑆𝑥𝑥 = ∑𝑥 − = 0.68088
𝑛
(∑ 𝑦)2
2
𝑆𝑆𝑦𝑦 = ∑𝑦 − = 173.3769
𝑛
MATH 403- ENGINEERING DATA ANALYSIS
(∑ 𝑥)(∑ 𝑦)
𝑆𝑆𝑥𝑦 = ∑ 𝑥𝑦 − = 10.17744
𝑛
10.17744
𝑟=
√(0.68088)(173.3769)
= 0.9367
A correlation coefficient of 0.9367 indicates good positive linear relationship between the
two variables. Taking 𝑟 2 = 0.8774, this means that approximately 88% of the variation in
regression model. To test hypotheses about the slope and intercept of the regression
model, the error component in the model, ε, is assumed to be normally and independently
distributed with mean zero and variance 2, abbreviated NID (0, 2).
Suppose that we wish to test the hypothesis that the slope equals a constant, say,
H0: 𝑏 = 𝑏0 , H1: 𝑏 ≠ 𝑏0
Because the errors i are NID (0, 2), it follows directly that the observations 𝑦𝑖 are NID
consequently, 𝑏̂ is𝑁(𝑏, 𝜎 2 /𝑆𝑥𝑥 ), using the bias and variance properties of the slope. In
MATH 403- ENGINEERING DATA ANALYSIS
𝑏̂ is independent of𝜎̂ 2 . As a result of those properties, the statistic for the slope
𝑏̂ − 𝑏0
𝑡0 =
√𝜎̂ 2 /𝑆𝑥𝑥
where √𝜎̂ 2 /𝑆𝑥𝑥 is the standard error of the slope, 𝑠𝑒(𝑏̂). The same procedure can be
used to test the hypotheses about the y-intercept. The hypotheses are:
H0: 𝑎 = 𝑎0 , H1: 𝑎 ≠ 𝑎0
𝑎̂ − 𝑎0
𝑡0 =
1 𝑥̅ 2
√𝜎̂ 2 [ +
𝑛 𝑆𝑥𝑥 ]
1 𝑥̅ 2
where √𝜎̂ 2 [𝑛 + 𝑆 ] is the standard error of the slope, 𝑠𝑒(𝑎̂)
𝑥𝑥
𝑎̂ − 𝑎0
𝑡0 =
𝑠𝑒(𝑎̂)
In both cases, the null hypothesis is to be rejected if the computed value of the test
These relate to the significance of regression. Failure to reject the null hypothesis H0:
dependent and the independent variables or that the true relationship between the two
variables in not linear. If the null hypothesis H0: 𝑏 = 0 is rejected, it could mean that the
estimated regression line. It is a procedure where the total variation in the dependent
variable is subdivided into meaningful components that are then observed and treated
systematically. Suppose that we have 𝑛 experimental data points in the usual form (𝑥𝑖 , 𝑦𝑖 )
and that the regression line is estimated. In the previous section, in the estimation of𝜎 2 ,
An alternative and more informative formulation is partitioning of the total corrected sum
where SSR is the regression sum of squares which reflects the amount of variation in
the 𝑦 − values explained by the straight line model and SSE is the error sum of squares
says that the model is𝜇𝑦|𝑥 = 𝛼. This means that the variation in 𝑌 results from random
chances or fluctuations that are independent of𝑥. In order to this the hypothesis, the f-
𝑆𝑆𝑅/1 𝑆𝑆𝑅
𝑓= = 2
𝑆𝑆𝐸/(𝑛 − 2) 𝑠
MATH 403- ENGINEERING DATA ANALYSIS
showing the summary on how to compute for the f-statistic is presented below
One of the reasons for building a linear regression is to predict response values at
one or more values of the independent variable. This section focuses on the errors
estimate the mean 𝜇𝑌|𝑥0 at 𝑥 = 𝑥0 . It can also be used to predict a single value when 𝑥 =
𝑥0 . The error of prediction is expected to be higher when predicting a single value than
when a mean value is predicted. It will then affect the width of intervals for the values
being predicted. To construct a confidence interval for𝜇𝑌|𝑥0 , the point estimator 𝑌̂0 = 𝐴 +
𝐵𝑥0 to estimate 𝜇𝑌|𝑥0 = 𝛼 + 𝛽𝑥0 . It can be shown that the sampling distribution of 𝑌̂0 is
and variance
1 (𝑥0 − 𝑥̅ )2
𝜎𝑌2̂0 = 𝜎𝐴+𝐵𝑥
2
= 𝜎 2
𝑌̅+𝐵(𝑥0 −𝑥̅ ) = 𝜎 2
[ + ]
0
𝑛 𝑆𝑥𝑥
Thus, the (1 − 𝛼) 100% confidence interval on the mean response 𝜇𝑌|𝑥0 can now be
𝑌̂0 − 𝜇𝑌|𝑥0
𝑇=
1 (𝑥 − 𝑥̅ )2
𝑆√(𝑛) + 0𝑆
𝑥𝑥
Example 1. Using the above example about the level of purity of oxygen, construct a
95% confidence interval about the mean response. In particular, predict the mean
Solution:
The fitted model is 𝜇𝑌|𝑥0 = 74.283 + 14.947𝑥0 and the 95% confidence interval is
1 (𝑥0 − 1.1960)2
𝜇𝑌|𝑥0 ± 2.101√1.18[ +
20 0.68088
1 (1.00 − 1.1960)2
89.23 ± 2.101√1.18[ +
20 0.68088
MATH 403- ENGINEERING DATA ANALYSIS
89.23 ± 0.75
Now consider the prediction interval for a single response. A (1 − 𝛼) 100% prediction
1 (𝑥0 − 𝑥̅ )2 1 (𝑥0 − 𝑥̅ )2
𝑦̂0 − 𝑡𝛼 𝑠√1 + + < 𝑦0 < 𝑦̂0 + 𝑡𝛼 𝑠√1 + +
2 𝑛 𝑆𝑥𝑥 2 𝑛 𝑆𝑥𝑥
Example 2. Using the above example about the level of purity of oxygen, find a 95%
prediction interval on the next observation of the level of purity of oxygen at 𝑥0 = 1.00%.
Solution:
1 (1.00 − 1.1960)2
89.23 − 2.101√1.18 [1 + + ≤ 𝑌0
20 0.68088
1 (1.00 − 1.1960)2
≤ 89.23 + 2.101√1.18 [1 + +
20 0.68088
Simplifying
86.83 ≤ 𝑌0 ≤ 91.63
MATH 403- ENGINEERING DATA ANALYSIS
parameters of the model will require the assumption that the errors are uncorrelated
random variables with mean zero and constant variance. Tests of hypotheses and interval
estimation require that the errors be normally distributed. In addition, we assume that the
order of the model is correct. If it is a simple linear regression model, the phenomenon
consider the validity of these assumptions. Analyses to examine the adequacy of the
model should be conducted. These can be done through residual analysis and coefficient
determination.
actual observation is and 𝑦̂𝑖 is the corresponding fitted value from the regression model.
Residual analysis is frequently helpful to check the assumption that the errors are
Figure 7. Patterns for residual plots. (a) Satisfactory (b) Funnel (c) Double bow (d) Nonlinear
Example1. Determine the residuals of the previous problem and plot the graph.
Solution:
Hydrocarbon level, Oxgen purity, Predicted value, Residual,
x y 𝑦̂ 𝑒 = 𝑦 − 𝑦̂
0.99 90.01 89.081 0.929
1.15 91.43 91.472 -0.042
1.46 96.73 96.106 0.624
0.87 87.59 87.287 0.303
1.55 99.42 97.451 1.969
1.19 93.54 92.070 1.470
0.98 90.56 88.931 1.629
1.11 89.85 90.874 -1.024
1.26 93.25 93.116 0.134
1.43 94.98 95.657 -0.677
1.02 89.05 89.529 -0.479
1.29 93.74 93.565 0.175
1.36 94.45 94.611 -0.161
1.23 91.77 92.668 -0.898
1.4 93.65 95.209 -1.559
1.15 92.52 91.472 1.048
1.01 89.54 89.379 0.161
1.2 90.39 92.219 -1.829
1.32 93.41 94.013 -0.603
0.95 87.33 88.483 -1.153
MATH 403- ENGINEERING DATA ANALYSIS
Figure 8. Normal probability plot of residuals (left). Plot of residuals versus predicted
values (center). Plot of residuals versus hydrocarbon level
regression model. It is the square of the correlation coefficient between jointly distributed
random variables 𝑋 and 𝑌 and has a value 0 ≤ 𝑅 2 ≤ 1 from the analysis of variance
accounted for by the regression model. For the oxygen purity regression model we
𝑆𝑆𝑅 152.13
have𝑅 2 = 𝑆𝑆𝑇 = 173.38 = 0.877; that is, the model accounts for 87.7% of the variability in
the data. It is always possible to make 𝑅 2 unity by adding enough terms to the model and
For example, a “perfect” fit can be obtained with a polynomial of degree n − 1. Generally,
𝑅 2 will increase if a variable is added to the model, but this does not necessarily imply
that the new model is superior to the old one. Unless the error sum of squares in the new
model is reduced by an amount equal to the original error mean square, the new model
will have a larger error mean square than the old one because of the loss of 1 error degree
of freedom. Thus, the new model will actually be worse than the old one. The dispersion
MATH 403- ENGINEERING DATA ANALYSIS
of the variable x impacted the magnitude of𝑅 2 . The larger the dispersion, the larger the
There are some misconceptions about𝑅 2 . It does not measure the magnitude of the
slope of the regression line. A large value of𝑅 2 does not imply a steep slope. Also, 𝑅 2
does not measure the appropriateness of the model because it can be artificially inflated
by adding higher-order polynomial terms to the model. Even if y and x are related in a
nonlinear fashion, 𝑅 2 . will often be large. Lastly, even though𝑅 2 . Is large, this does not
necessarily imply that the regression model will provide accurate predictions of future
observations.
10.7. Correlation
between two variables by means of a single number called a correlation coefficient. This
correlation coefficient measures how closely the points in a scatter diagram are spread
around a line. The symbol for the sample correlation coefficient is 𝑟. The symbol for the
population coefficient is the Greek letter, rho (𝜌). The value of 𝜌 is 0 when β1 = 0, which
results when there essentially is no linear regression; that is, the regression line is
have ρ2 ≤ 1 and hence −1≤ 𝜌 ≤1. Values of ρ = ±1 only occur when σ2 = 0, in which case
we have a perfect linear relationship between the two variables. Thus, a value of ρ equal
to +1 implies a perfect linear relationship with a positive slope, while a value of ρ equal to
−1 results from a perfect linear relationship with a negative slope. It might be said, then,
MATH 403- ENGINEERING DATA ANALYSIS
that sample estimates of 𝜌 close to unity in magnitude imply good correlation, or linear
association, between X and Y, whereas values near zero indicate little or no correlation.
The measure 𝜌 of linear association between two variables X and Y is estimated by the
𝑆𝑥𝑥 𝑆𝑥𝑦
𝑟 = 𝑏1 √ =
𝑆𝑦𝑦 √𝑆𝑥𝑥 𝑠𝑦𝑦
REFERENCES:
Montgomery, Douglas C., et al., Applied Statistics and Probabiliy for Engineers, 7th ed., John
Wiley & Sons (Asia) Pte Ltd, 2018
Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists, 9th ed.,
Pearson Education Inc., 20
MATH 403- ENGINEERING DATA ANALYSIS
CHAPTER TEST
An article in the Journal of Environmental Engineering (1989, Vol. 115(3), reported the
results of a study on the occurrence of sodium and chloride in surface streams in central
Rhode Island. The following data are chloride concentration y (in milligrams per liter) and
x y x y x y
2. Fit the simple linear regression model using the method of least squares. Find an
estimate of σ2.
3. Estimate the mean chloride concentration for a watershed that has 1% roadway
area.
4. Find the fitted value corresponding to x = 0.47 and the associated residual.
5. Test the hypothesis H0: β1 = 0 versus H1: β1 ≠ 0 using the analysis of variance
6. Find a 99% confidence interval of Mean chloride concentration when roadway area
x = 1.0%
1.0%.
9. Prepare a normal probability plot of the residuals. Does the normality assumption
appear to be satisfied?
Ungrouped (or raw) data are data which are not organized in any
specific way. They are simply the collection of data as they are
gathered.Data are raw data organized into groups or categories with
Grouped
corresponding frequencies. Organized in this manner, the data is
referred to as frequency distribution.
Parameter is the descriptive measure of a characteristic of a
population
Statistic is a measure of a characteristic of sample
Constant is a characteristic or property of a population or sample
which is common to all members of the group.
Variable is a measure or characteristic or property of a population
or sample that may have a number of different values.
Example
We are going to figure out if all Female over 30 years old can
lose weigh in Vegan diet
Example
In a our Midterm Exam, 25 out of 50 students were able to receive a
passing grade. The average score of the class is 75 out of 100
STATISTICS
Example
20% living in our subdivision prefer to drive Toyota cars.
Methods of Data Collection
Data Gathering
collected for the investigator’s use from collected by some other organization
the primary source. for their own use but the investigator
also gets it for his use
3 Basic Methods of Collecting Data
Through Retrospective Study
1
Retrospective Study
❑ would use the population or sample of the historical data which had been
archived over some period of time
❑ may involve a significant amount of data but those data may contain
relatively little useful information about the problem, some of the relevant
data may be missing, recording errors or transcription may be present, or
those other important data may not have been gathered and archived
Observational Study
❑ process or population is observed and disturbed as little as possible, and
the quantities of interests are recorded
Through Designed Experiment
3
Designed Experiment
❑ deliberate or purposeful changes in the controllable variables of the system
or process is done
❑ needed to establish cause-and-effect relationships
❑ almost always necessary to e conducted to confirm the applicability and
validity of theories
❑ very important in engineering design and development and in the
improvement of manufacturing processes
Purposive Sampling
2 The selection of respondents is predetermined according to the characteristic of
interest made by the researcher.
Quota Sampling
3 1) Proportional
the major characteristics of the population by sampling a proportional
amount of each is represented
2) Non-proportional
a minimum number of sampled units in each category is specified and
not concerned with having numbers that match the proportions in the
population.
2
❑ There are several probability techniques. Among these are simple random
sampling, stratified sampling and cluster sampling.
Simple Random Sampling - SRS
1 Basic sampling technique where a group of subjects is selected for study from a
larger group. Each member of the population has an equal chance of being
included in the sample.
Stratified Sampling
2 Sampling technique where subjects are obtained by taking samples from each
stratum or sub-group of a population.
Cluster Sampling
3 Sampling technique where the entire population is divided into groups, or
clusters, and a random sample of these clusters are selected.
PROBABILITY
Intended Learning Outcomes
At the end of this module, it is expected that the students will be able to:
Definition of Terms
Sample Space
▪ set of all possible outcomes or results of a random experiment
▪ represented by the letter S
Event E1 = {1, 3, 5}
▪ subset of the sample space
▪ represented by the letter E E2 = {2, 4, 6}
Example
Draw a tree diagram for tossing a coin 3 times and then
answer the following questions.
What is the probability that tossing a coin 3 times will
result in one head and two tails?
SAMPLE SPACE
HHH, HHT, HTH, THH, TTH, THT, HTT, TTT
A B
A B
S S
A B
C S
Example
S = {HHH, HHT
HHT, HTH
HTH, THH
THH, TTH, THT, HTT, TTT}
We can have more than one event for a sample space but there will be
one and only one sample space for an event.
If we have Events E1, E2, E3, …, En as all the possible subset of sample
space then we have,
Example
Rolling a Die
S = {1, 2, 3, 4, 5, 6}
E1 = {1, 3, 5}
E2 = {2, 4, 6}
E1 ∪ E 2 = S
Null Space
A B
S
Example
A B
6 12 1 5
12
8
3 9
15 3 15
17
X = {q, w, e, r, t} Y = {a, s, d, f}
X Y
q a
w d
e
s
r
t
f
X∩Y=∅
Mutually Exclusive Events
Union of Events
▪ A ∪ B = {x | x ∈ A or x ∈ B}
A B
S
Example
A B
a e b
e
c
u i
o f
d
A ∪ B = {a, b, c d, e, f, i, o, u}
Example
X = {1, 2, 3, 4} Y = {3, 4, 5, 6}
A B
1 5
6
3 2
4 3 4
X ∪ Y = {1, 2, 3, 4, 5, 6}
Complement of an Event
▪ The complement of an event A with respect to S is the set of all
elements of S that are not in A.
▪ It is denoted as A’.
Ex.
Consider the experiment of rolling a single die.
S={1,2,3,4,5,6}
A’ = {cow, snake}
Probability of an Event
Where,
n(S) represents number of elements in a sample space of an experiment;
S = {1, 2, 3, 4, 5, 6}
E = {1, 3, 5}
Rules of Probability
Before discussing the rules of probability, we state the following
definitions:
• Two events are mutually exclusive or disjoint if they cannot
occur at the same time.
P(A or B) P(A or B)
A student goes to the library. The probability that she checks out (a) a work
of fiction is 0.40, (b) a work of non-fiction is 0.30, and (c) both fiction and
non-fiction is 0.20. What is the probability that the student checks out a
work of fiction, non-fiction, or both?
Rule of Multiplication
Example
An urn contains 6 red marbles and 4 black marbles. Two marbles are drawn
without replacement from the urn. What is the probability that both of the
marbles are black?
Two cards are selected from a pack of cards. What is the probability that
they are both queen?
Let A = event that the first
card is queen
Let B = event that the 2nd
card is queen P(A ∩ B) = P(A)P(B|A) P(A) = 4/52 = 1/13
The probability that event A will occur is equal to 1 minus the probability
that event A will not occur.
P(A) = 1 − P(A′)
Example
The probability of Bill not graduating in college is 0.8. What is the probability
that Bill will graduate from college?
P(A) = 1 − P(A′)
Counting Rules Useful in Probability
Multiplicative Rule
Example
Suppose, for instance, that a customer wishes to buy a new cell phone
and can choose from n1 = 5 brands, n2 = 5 sets of capability, and n3 = 4
colors. How many possible ways can a costumer order one of these
phones? Rose gold Cellphone brand
1. Apple
Black
2. Samsung
3. Huawei
Plenty of 4. Oppo
storage White 5. Vivo
space
A
A great Capabilities
long-lasting
camera
battery 1.A long-lasting battery
Crystal-clear 2.Warp-speed processing
Huawei
display
Silver 3.Crystal-clear display
4.A great camera
Warp-speed
apple 5.Plenty of storage space
processing
A
start Vivo long-lasting Colors
battery 1.White
2.Black
Oppo
3.Rose gold
4.Silver
samsung
Example
A company puts a code on each different product they sell. The code is
made up of 3 numbers and 2 letters. How many different codes are
possible?
Partitions rule
Where,
Example
Consider the set {a, e, i, o, u}. The possible partitions into two cells
in which the first cell contains 4 elements and the second cell 1
element are
Example
Example
You have 12 system analysts and you want to assign three to job 1, four to
job 2, and five to job 3. In how many different ways can you make this
assignment?
Example
Example
How many different letter arrangements can be made from the letters in the
word STATISTICS?
Permutation Rule
Permutation Rule
ab, ac, ad, ba, bc, bd, ca, cb, cd, da, db, and dc
Example
In one year, three awards (research, teaching, and service) will be given to a
class of 25 graduate students in a statistics department. If each student can
receive at most one award, how many possible selections are there?
Example
Example
How many ways can we award a 1st, 2nd and 3rd place prize among eight
contestants?
Combinations Rule
Example
A young boy asks his mother to get 5 Game-BoyTM cartridges from his
collection of 10 arcade and 5 sports games. How many ways are there that
his mother can get 3 arcade and 2 sports games?
Example
Example
Example
Example
THANK YOU
DISCRETE PROBABILITY
DISTRIBUTIONS
Intended Learning Outcomes
At the end of this module, it is expected that the students will be able to:
For example, the value of x1 takes on the probability p1, the value of x2 takes
on the probability p2, and so on. The probabilities pi must satisfy two
requirements: every probability pi is a number between 0 and 1, and the
sum of all the probabilities is 1. (p1+p2+⋯+pk=1)
Discrete Random Variables
Supposed three electronic components are tested, one is naturally
concerned with the number of defectives that occur. Determine the
values of random variable X which is the number of defective items.
Let X be the number of defective item denoted by D
S = {NNN, NND, NDN, DNN, NDD, DND, DDN, DDD}
Number of Defective
Outcomes
Value of X
NNN 0
NND 1
NDN 1
DNN 1
NDD 2
DND 2
DDN 2
DDD 3
Probability Distribution
Supposed three electronic components are tested, one is naturally
concerned with the number of defectives that occur. Determine the
values of random variable X which is the number of defective items.
Let X be the number of defective item denoted by D
S = {NNN, NND, NDN, DNN, NDD, DND, DDN, DDD}
Number of Defective
Outcomes
Value of X
NNN 0 X 0 1 2 3
NND 1 P(X) 1/8 3/8 3/8 1/8
NDN 1
DNN 1
NDD 2
DND 2
DDN 2
DDD 3
Probability Mass Function
Supposed three electronic components are tested, one is naturally
concerned with the number of defectives that occur. Determine the
values of random variable X which is the number of defective items.
Let X be the number of defective item denoted by D
S = {NNN, NND, NDN, DNN, NDD, DND, DDN, DDD}
Number of Defective
Outcomes
Value of X
NNN 0 X 0 1 2 3
NND 1 P(X) 1/8 3/8 3/8 1/8
NDN 1
DNN 1
NDD 2
3/8, if X= 1,2
DND 2
f(x) =
1/8, if X=0,3 Probability mass
DDN 2 0, otherwise function
DDD 3
Probability Histogram
Number of
Outcomes Defective X 0 1 2 3
Value of X
NNN 0
P(X) 1/8 3/8 3/8 1/8
NND 1
NDN 1
DNN 1
NDD 2
DND 2
DDN 2
DDD 3
Necessary Conditions of Discrete Probability Distributions
Varianc
e
Standard
Deviation
Discrete Random Variable: Mean, Variance & Standard
Deviation
From our previous example. Determine the mean, variance and standard deviation of the following
probability
mass function
X P(X) X P(X) X-µ (X-µ)2 (X-µ)2P(X)
0 1/8 =0.125
1 3/8=0.375
2 3/8=0.375
3 1/8=0.125
Discrete Random Variable: Mean, Variance & Standard
Deviation
1 1.5 0.75
Cumulative Distribution Functions
Cumulative Distribution Functions
Example
X 0 1 2
❑ Each trial can result in just two possible outcomes. We call one of these
outcomes a success and the other, a failure.
❑ The trials are independent; that is, the outcome on one trial does not
affect the outcome on other trials.
Notation
b (x; n, P): Binomial probability - the probability that an n-trial binomial experiment
results in exactly x successes, when the probability of success on an individual trial is
P.
Binomial Formula and Binomial Probability
Example
Example
If a car agency sells 50% of its inventory of a certain foreign car equipped
with side airbags, find a formula for the probability distribution of the
number of cars with side airbags among the next 4 cars sold by the agency.
Example
The probability that a certain kind of component will survive a shock test is
3/4. Find the probability that exactly 2 of the next 4 components tested
survive.
Example
Example
The probability that a patient recovers from a rare blood disease is 0.4. If 15
people are known to have contracted this disease, what is the probability
that exactly 5 survive.
Cumulative Binomial Probability
Example
Example
X 0 1 2 3 4
Example
Example
The probability that a patient recovers from a rare blood disease is 0.4. If 15
people are known to have contracted this disease, what is the probability
that at least 10 survive.
Example
The probability that a patient recovers from a rare blood disease is 0.4. If 15
people are known to have contracted this disease, what is the probability
that from 3 to 8 survive.
Poisson Distribution
Poisson Experiment
P (x; μ): The Poisson probability that exactly x successes occur in a Poisson
experiment, when the mean number of successes is μ.
Poisson Distribution
Given the mean number of successes (μ) that occur in a specified region,
we can compute the Poisson probability based on the following Poisson
formula. Poisson Formula. Suppose we conduct a Poisson experiment, in
which the average number of successes within a given region is μ. Then, the
Poisson probability is:
Example
Example
Cumulative Poisson Probability
Example
CONTINUOUS PROBABILITY
Intended Learning Outcomes
At the end of this module, it is expected that the students will be able to:
Properties of
Continuous Probability Distributi
Density Formulas
P(x) = 1
Density Formulas
Density Formulas
a b c d
P(x>d)
P(x=c)
P(b<x<c)
P(x<a)
Density Formulas
a.
b.
Cumulative Distribution Function
Example
Example
Expected Values of Random Variables
Expected Value
Variance and Standard Deviation
Example
Example
Example
Continuous Uniform Distribution
Example
Normal Distribution
Normal Distribution
Normal Distribution
Normal Distribution
Characteristics of a Normal
Distribution
Normal Distribution
Characteristics of a Normal
Distribution
Standard Normal Distribution
Standard Normal Distribution
Standard Normal Distribution
Standard Normal Distribution
Normal Distribution
The test scores of a EDA class with eight hundred students are distributed normally with a mean of 75 and a
standard deviation of seven.
a what percentage of the class has a test score between 68 and 82
b. how many students have a test score between 61 and 89
c. what is the probability that a student chosen at random has a test score between 54 and 75
Normal Distribution
The test scores of a EDA class with eight hundred students are distributed normally with a mean of 75 and a
standard deviation of seven.
a what percentage of the class has a test score between 68 and 82
b. how many students have a test score between 61 and 89
c. what is the probability that a student chosen at random has a test score between 54 and 75
b.
Normal Distribution
The test scores of a EDA class with eight hundred students are distributed normally with a mean of 75 and a
standard deviation of seven.
a what percentage of the class has a test score between 68 and 82
b. how many students have a test score between 61 and 89
c. what is the probability that a student chosen at random has a test score between 54 and 75
Normal Distribution
The test scores of a EDA class with eight hundred students are distributed normally with a mean of 75 and a
standard deviation of seven.
a what percentage of the class has a test score between 68 and 82
b. how many students have a test score between 61 and 89
c. what is the probability that a student chosen at random has a test score between 54 and 75
Example
Example
Example
Example
Normal Approximation to Binomial and
Poisson Distribution
Binomial Approximation
Poisson Approximation
Continuity Correction
Example
Example
Example
Let say we have b(x;n,p) = b (4; 15,0.4)
Example
Example
Example 1: Poisson Approximation
Example 1: Poisson Approximation
Example 2: Poisson Approximation
Example 2: Poisson Approximation
Exponential Distribution
Exponential Distribution
Example
Example
Example
A checkout counter at a supermarket completes the process according to an exponential distribution with a service
rate at 6 per hour. A customer arrives at the checkout counter.
Find the probability of the following events.
a. The service is completed in fewer than 5 minutes
b. The customer leaves the checkout counter more than 10 minutes after arriving
c. The service is completed in a time between 5 and 8 minutes
Example
A checkout counter at a supermarket completes the process according to an exponential distribution with a service
rate at 6 per hour. A customer arrives at the checkout counter.
Find the probability of the following events.
a. The service is completed in fewer than 5 minutes
b. The customer leaves the checkout counter more than 10 minutes after arriving
c. The service is completed in a time between 5 and 8 minutes
Example
A checkout counter at a supermarket completes the process according to an exponential distribution with a service
rate at 6 per hour. A customer arrives at the checkout counter.
Find the probability of the following events.
a. The service is completed in fewer than 5 minutes
b. The customer leaves the checkout counter more than 10 minutes after arriving
c. The service is completed in a time between 5 and 8 minutes