With Solutions MATH22 - Engineering Data Analysis Module 3
With Solutions MATH22 - Engineering Data Analysis Module 3
With Solutions MATH22 - Engineering Data Analysis Module 3
© https://uccs.edu/dase/
COVER PAGE
It shows the course code and title and
the specific topic discussed in this
module.
OVERVIEW
A general introduction of the specific topic
presented in the title, as well as the learning
outcomes and the time frame is included in this
section.
LECTURE
This section focuses on the discussion of the topic
which may include graphs, tables, and pictures to
help the student/reader in understanding the lesson.
PRACTICE PROBLEMS
It involves questions or follow-up problems about the
topic to evaluate how much you understand in this
module.
ASSESSEMENT
To further assess your understanding, this section
will contain more challenging problems and real-life
situations in which you can apply what you have
learned throughout this module
SUPPLEMENTARY KNOWLEDGE
This section will help you in exploring the lesson by
providing tips, techniques, and other relevant information
to support your knowledge.
ANSWER KEY
This contains answers to all the problems included in
this module.
REFERENCES
This includes the list of all the reference books used
in creating this module.
Overview
INTRODUCTION:
One engineering firm enjoys 40% success rate in getting state
government construction contracts. This month they have submitted bids on
eight construction projects to be funded by the state government. How likely is
this firm to get none of those contracts? Five out of eight contracts? All eight
contracts? To answer such questions, we obtain the probability distribution
that describes the likelihood of all possible outcomes in a given situation.
Whether an experiment yields qualitative or quantitative outcomes,
methods of statistical analysis require that we focus on certain numerical
aspects of the data (such as a sample proportion x/n, mean, or standard
deviations). The concept of a random variable allows us to pass from the
experimental outcomes themselves to a numerical function of the outcomes.
There are two fundamentally different types of random variables—discrete
random variables and continuous random variables. In this module, we
examine the basic properties and discuss the most important examples of
discrete variables.
COURSE OUTCOMES:
At the end of this module, students will be able to:
Apply statistical methods in the analysis of data and designing experiments.
PROGRAM OUTCOMES:
Design a system, component, or process to meet desired needs within
realistic constraints such as economic, environmental, social, political,
ethical, health and safety, manufacturability, and sustainability, in
accordance with standards;
Identify, formulate, and solve complex problems in electrical engineering;
and
Apply techniques, skills and modern engineering tools necessary for
electrical engineering practice.
TIME FRAME:
1 week
LECTURE 1:
RANDOM VARIABLES AND THEIR
PROBABILITY DISTRIBUTIONS
In any experiment, there are numerous characteristics that can be observed or measured,
but in most cases an experimenter will focus on some specific aspect or aspects of a sample.
For example, in a study of commuting patterns in a metropolitan area, each individual in a
sample might be asked about commuting distance and the number of people commuting in
the same vehicle, but not about IQ, income, family size, and other such characteristics.
Alternatively, a researcher may test a sample of components and record only the number
that have failed within 1000 hours, rather than record the individual failure times. In general,
each outcome of an experiment can be associated with a number by specifying a rule of
association (e.g., the number among the sample of ten components that fail to last 1000
hours or the total weight of baggage for a sample of 25 airline passengers). Such a rule of
association is called a random variable—a variable because different numerical
values are possible and random because the observed value depends on which of the
possible experimental outcomes results (Figure 3.1).
Adapted from Probability and Statistics for Engineering and the Sciences by Jay L. Devore, 8th Edition
Random variables are typically denoted by uppercase letters, such as X, Y, and Z. The
actual numerical values that a random variable can assume are denoted by lowercase
letters, such as x, y, and z. The probability that the random variable X takes a value x is
denoted by P(X = x) or P(x). We encounter two types of random variables, discrete and
continuous. In this module, we’ll study discrete random variables and their probability
distributions. In the next module, we’ll study continuous random variables and their
probability distributions.
Example 3.1
A section of an electrical circuit has two relays, numbered 1 and 2, operating in parallel. The
current will flow when a switch is thrown if either one or both of the relays close. The
probability of a relay closing properly is 0.8 for each relay. We assume that the relays
operate independently. Let Ei denote the event that relay i closes properly when the switch is
thrown. Then � �� = 0.8. A numerical event of some interest to the operator of this system
is
�� = the number of relays that close properly when the switch is thrown
Now X can take on only three possible values, 0, 1, and 2. We can find the probabilities
associated with these values of X by relating them to the underlying events �� .Thus, using
the rules in the past module, we get
Table 3.1 Probability distribution of number of closed Figure 3.2 Probability distribution of number of closed
relays relays
Adapted from Probability and Statistics for Engineers by Adapted from Probability and Statistics for Engineers by
Scheaffer, Mulekar, McClave, 5th ed. Scheaffer, Mulekar, McClave, 5th ed.
The probability function is sometimes called the probability mass function of X to
denote the idea that a mass of probability is piled up at discrete points. It is often convenient
to list the probabilities for a discrete random variable in a table or plot in a chart. With X
defined as the number of closed relays, as in Example 3.1, the table and the graph of the
probability mass function are given in Table 3.1 and Figure 3.2, respectively. The listing and
the graph are two ways of representing the probability distribution of X.
Note that the probability function p(x) satisfies two requirements:
1. 0 ≤ � � ≤ 1 for � = 0, 1, 2
2. � 0 + � 1 + � 2 = 1
Example 3.2
The output of circuit boards from two assembly lines set up to produce identical
boards is mixed into one storage tray. As inspectors examine the boards randomly, it
is difficult to determine whether a board comes from line A or line Bo. A probabilistic
assessment of this situation is often helpful. Suppose a storage tray contains 10
circuit boards, of which 6 come from line A and 4 from line B. An inspector selects 2
of these identical-appearing boards for inspection. He is interested in X, the number
of inspected boards from line A. Find the probability distribution for X.
Solution:
The experimenter consists of two selections, each of which can result in two outcomes. Let
�� denote the event that the ith board comes from line A, and �� denote the event that it
comes from line B. Then the probability of selecting two boards from line is
The multiplicative rule of probability is used, and the probability of the second selection
depends on what happened on the first selection. There are other possibilities for outcomes
that will result in different values of X. These outcomes are conveniently listed using a tree
diagram in Figure 3.3. The probabilities for the various selections are given on the branches
of the tree. It is easily seen that X has three possible outcomes, with the probabilities listed
in Table 3.2.
Table 3.2
Figure 3.3 Outcomes of circuit board selection
Adapted from Probability and
Adapted from Probability and Statistics for Engineers by
Statistics for Engineers by Scheaffer,
Scheaffer, Mulekar, McClave, 5th ed.
Mulekar, McClave, 5th ed.
PRACTICE PROBLEMS
3.1P Among 10 applicants for an open position, 6 are females and 4 are males.
Suppose three applicants are randomly selected from the applicant pool for final
interviews. Find the probability distribution for X, the number of female applicants
among the final three.
2.2P Wade Boggs of the Boston Red Sox hit 0.363 in 1987. (He got a hit on 36.3%
of his official times at bat.) In a typical game, he was up to bat three official times.
Find the probability distribution for X, the number of hits in a typical game. What
assumptions are involved in the answer? Are the assumptions reasonable? Is it
unusual for a good hitter like Wade Boggs to go zero for three in one game?
LECTURE 2:
CUMULATIVE DISTRUBUTION
FUNCTIONS
Cumulative probabilities, �(� ≤ �), where X still represents the random variable and x now
represents an upper limit, are found by adding individual probabilities.
� �≤� = �(�� )
��≤�
Where �(�� ) is an individual probability function. For example, if �� can only be zero or
positive integer,
� �≤ 3 =� 0 +� 1 +� 2 +� 3
The functional relationship between the cumulative probability and the upper limit, x, is called
the cumulative distribution function, or the probability distribution function.
Note that since � � ≤ 2 = � 0 + � 1 + � 2 , we have � 3 = � � ≤ 3 − �(� ≤ 2)
In general,
� �� = � � ≤ �� − �(� ≤ ��−1 )
As an illustration, consider the random variable that represents the number of heads
obtained on tossing five fair coins. The probability of obtaining heads on any 1 one coin is 2.
The probability function and cumulative distribution are given by the binomial distribution,
which will be considered in detail in the next lecture. The probability function of possible
results is shown in Table 3.3 and Figure 3.5.
Example 3.3
The random variable X, denoting the number of relays closing properly (defined in
Example 3.1) has the probability distribution given below:
PRACTICE PROBLEMS
Verify that the following functions are cumulative distribution functions, and determine the
probability mass function and the requested probabilities.
0 �<1
3.3P � � = 0.5 1 ≤ � < 3
1 3≤�
a. � � ≤ 3 b. � � ≤ 2
c. � 1 ≤ � ≤ 2 d. � � > 3
3.4P
a. � � ≤ 50 b. � � ≤ 40
c. � 40 ≤ � ≤ 60 d. � � < 0
e. � 0 ≤ � < 10 f. � −10 < � < 10
LECTURE 3:
EXPECTED VALUES OF RANDOM
VARIABLES
Two numbers are often used to summarize a probability distribution for a random variable X.
The mean is a measure of the center or middle of the probability distribution, and the
variance is a measure of the dispersion, or variability in the distribution. These two
measures do not uniquely identify a probability distribution. That is, two different distributions
can have the same mean and variance. Still, these measures are simple, useful summaries
of the probability distribution of X.
The mean or expected value of the discrete random variable X, denoted as μ or E(X) is,
�=� � = ��(�)
�
�= �2
The mean of a discrete random variable X is a weighted average of the possible values of X
with weights equal to the probabilities. If f(x) is the probability mass function of a loading on
a long, thin beam, E(X) is the point at which the beam balances. Consequently, E(X)
describes the “center” of the distribution of X in a manner similar to the balance point of a
loading. See Fig. 3-5.
� � = � − � 2� � = �2 � � − 2� �� � + �2 �(�)
� � � �
= �2 � � − 2�2 + �2 = �2 � � − (�)2
� �
Figure 3.7 A probability distribution can be viewed as a loading with the mean equal to the balance point.
Parts (a) and (b) illustrate equal means, but Part (a) illustrates a larger variance.
Adapted from Applied Statistics and Probability for Engineers by Montgomery and Runger, 5th ed.
Figure 3.8 The probability distributions illustrated in Parts (a) and (b) differ even though they have equal
means and equal variances.
Adapted from Applied Statistics and Probability for Engineers by Montgomery and Runger, 5th ed.
Either formula for V (x) can be used. Figure 3.8 illustrates that two probability distributions
can differ even though they have identical means and variances.
Because probability can be thought of as the long-run relative frequency of occurrence for an
event, a probability distribution can be interpreted as showing the long-run relative frequency
of occurrence for numerical outcomes associated with a random variable.
Game 1: Suppose that you and your friend are matching balanced coins. Each of you
tosses a coin. If the upper faces match, you win $1; if they do not match, you
lose $1 (i.e., your friend wins $1).
The probability of a match is 0.5 and in the long run you will win about half the
time. Thus, a relative frequency distribution of your winnings should look like
Figure 3.9. Note that the negative sign indicates a loss to you.
On the average, how much will you win per game over the long run? If Figure
3.9 is a correct display of your winnings, you win -1 half the time and +1 half
the time, for an average of
1 1
−1 + +1 =0
2 2
This average is sometimes called your expected winnings per game or the
expected value of your winnings. An expected value of 0 indicates that this is
a fair game.
The expected value can be thought of as the mean value of your winnings
over many games.
Game 3: You and your friend decide to play the coin–matching game by allowing you
to win $1 if the match is tails and $2 if the match is heads. You lose $1 if the
coins do not match. Notice that this is not a fair game, because your expected
winnings are
1 1 1
−1 + +1 + +2 = 0.25
2 4 4
So you are likely to win 25 cents per game in the long run. In other words, the
game is favorable to you.
Game 4: In the coin-matching game, suppose you pay your friend $1.50 if the coins do
not match. You’ll still win $1 if the match is tails and $2 if the match is heads.
Now your expected winnings per game are
1 1 1
−1.50 + +1 + +2 =0
2 4 4
and the game now is fair. What is the difference between Game 4 and the
original Game 1 in which the payoffs were $1?
The difference certainly cannot be explained by the expected value, because
both games are fair. You can win more but also lose more with the new
payoffs, so the difference between the two can be partially explained in terms
of the variation of your winnings across many games. This increased variation
can be seen in Figure 3.10, the relative frequency for your winnings in Game
4, which is more spread out than the one shown in Figure 3.3.
Example 3.4
Table below provides the age distribution of the population (U.S. Bureau of Census)
for 1990 and 2050 (projected). The numbers are percentages.
Table 3.4
Adapted from Probability and Statistics for Engineers by Scheaffer, Mulekar,
McClave, 5th ed.
Age is actually a continuous measurement, but when reported in categories, we can treat it
as a discrete random variable. To move from continuous age intervals to discrete age
classes, we assign each interval the value of its midpoint (rounded). The data in table above
are interpreted as reporting that 7.6% of the 1990 population was around 3 years of age and
22.5% of the 2050 population is anticipated to be around 55 years of age. The open interval
at the upper end was stopped at 100 years for convenience.
With the percentages interpreted as probabilities, the mean age for 1990 is approximated by
The mean age is anticipated to increase rather markedly. The variation in the two age
distributions can be approximated by the standard deviation. For 1990,
and employing similar calculation for 2050 data, we get � = 25.4. These results are
summarized in table below.
Table 3.4
Adapted from Probability and Statistics for Engineers by Scheaffer, Mulekar,
McClave, 5th ed.
The population is not only getting older, on the average, but its variation is increasing too.
What are some of the implications of these trends?
PRACTICE PROBLEMS
3.5P You are to pay $1 to play a game consisting of drawing one ticket at random from a
box of numbered tickets. You win the amount (in dollars) of the number on the ticket
you draw. Two boxes are available with numbered tickets as shown below:
3.6P The graph in Figure 3.11 shows the age distribution for AIDS deaths in the United
States through 1995. Approximate the mean and standard deviation of this age
distribution. How does the mean age compare to the approximate median age?
Many experimental and sample survey situations result in a random variable that can be
adequately modeled by the binomial distribution. For example,
• Number of defectives in a sample of n items from a large population
• Counts of number of employees favoring a certain retirement policy out of n employees
interviewed
• The number of pistons in an eight-cycle engine that are misfiring
• The number of electronic systems sold this week out of the n that are manufactured.
The binomial distribution is defined by two parameters, n (number of trials) and p (probability
of success on each trial). The binomial random variable Y = the number of successes in n
trials.
The Binomial Distribution
� �
� � = � (1 − �)�−� � = 0,1,2,3, …, � for 0 ≤ � ≤ 1
�
� � = �� and � � = ��(1 − �)
The figures on below show how the parameter values (n and p) affect the probability function
for the binomial distribution
Figure 3.12 The effect of n, and p on the binomial probability function
Adapted from Probability and Statistics for Engineers by Scheaffer, Mulekar,
McClave, 5th ed.
And
� �
� �
� � = � − � 2� � = (� − �)2 � (1 − �)�−� = ��(1 − �)
�
�=0 �=0
Example 3.5
Suppose a large lot contains 10% defective fuses. Four fuses are randomly sampled
from the lot.
(a) Find the probability that exactly one fuse in the sample of four is defective.
(b) Find the probability that at least one fuse in the sample of four is defective.
(c) Suppose the four fuses sampled from the lot were shipped to a customer
before being tested, on a guarantee basis. If any fuse is defective, the
supplier will repair it without any change to the customer. Assume that the
cost of making the shipment good is C =3�2 , where Y denotes the number
of defectives in the shipment of four. Find the expected repair cost.
Solution:
We assume that the four fuses are sampled independently of each other and that the
probability of being defective is the same (0.1) for each fuse. This will be approximately true
if the lot is indeed large. If the lot is small, removal of one fuse would substantially change
the probability of observing a defective on the second draw. For a large lot, the binomial
distribution provides a reasonable model in this experiment with n = 4 and p = 0.1. Let
� = the number of defective fuses out of 4 inspected
(a) The probability that exactly one fuse in the sample of four is defective is
4
� �=1 =� 1 = 0.11 0.93 = 0.2916
1
(b) The probability that at least one fuse in the sample of four is defective is
4
� �≥1 =1−� �=0 =1− 0.10 0.94 = 0.3439
0
�(�)2 = � � + �2 = �� 1 − � + (��)2
� � = 3�(�)2 = 3 �� 1 − � + (��)2
` If the cost were originally in tens of dollars, we could expect to pay an average of
$15.60 in repair costs for each shipment of four fuses.
PRACTICE PROBLEMS
3.7P Let X denote a random variable having a binomial distribution with p = 0.2 and n = 4.
Find the following.
(a) P ( X = 2) (b) P ( X ≥ 2)
(c) P ( X ≤ 2) (d) E ( X )
(e) V ( X )
3.8P A machine that fills boxes of cereal underfills a certain proportion p.If 25 boxes are
randomly selected from the output of this machine, find the probability that no more
than two are underfilled when
(a) p = 0.1 (b) p = 0.2
LECTURE 5:
THE POISSON DISTRIBUTION
The Poisson distribution occurs when we count the number of occurrences of an event over
a given time period or length or area or volume. For example:
• The number of flaws in a square yard of fabric
• The number of bacterial colonies in a cubic centimeter of water
• The number of times a machine fails in the course of a workday
�� �−�
� � = � = 0,1,2,3, …,
�!
� � =� and � � =�
�2 � 3
�� = 1 + � + + +…
2! 3!
Then,
Figure 3.13 The Poisson Distribution
Adapted from Probability and Statistics for Engineers by Scheaffer, Mulekar,
McClave, 5th ed.
To find the variance, first find �(�)2 = � � � − 1 + � � . Then derive the variance using
similar arguments as in the expected value, and the relation � � = �(�)2 − [�(�)]2
The Poisson distribution applies in its own right where the possible number of discrete
occurrences is much larger than the average number of occurrences in a given interval of
time or space. The number of possible occurrences is often not known exactly. The
outcomes must occur randomly, that is, completely by chance, and the probability of
occurrence must not be affected by whether or not the outcomes occurred previously, so the
occurrences are independent. In many cases, although we can count the occurrences, such
as of a thunderstorm, we cannot count the corresponding non-occurrences. (We can’t count
“non-storms”!) Examples of occurrences to which the Poisson distribution often applies
include counts from a Geiger counter, collisions of cars at a specific intersection under
specific conditions, flaws in a casting, and telephone calls to a particular telephone or office
under particular conditions. For the Poisson distribution to apply to these outcomes, they
must occur randomly.
Example 3.6
For a certain Manufacturing industry, the number of industrial accidents averages
three per week.
(a) Find the probability that no accident will occur in a given week.
(b) Find the probability that two accidents will occur in a given week.
(c) Find the probability that at most four accidents will occur in a given week.
(d) Find the probability that two accidents will occur in a given day.
Solution:
(a) Using � = mean number of accidents per week = 3, we get
30 −3
� no accident in a given week = � 0 = 0!
� = �−3 = 0.05
30 −3 31 −3 32 −3 33 −3 34 −3
= � + � + � + � + � = 0.815
0! 1! 2! 3! 4!
(d) Now we are interested in the number of accidents on a given day. Thus,
using � = mean number of accidents per day = 3/7 = 0.2857, we get
0.28572 −0.2857
� Two accidents in a given week = � 2 = � = 0.031
2!
PRACTICE PROBLEMS
3.9P The number of telephone calls coming into the central switchboard of an office building
averages four per minute.
(a) Find the probability that no calls will arrive in a given 1-minute period.
(b) Find the probability that at least two calls will arrive in a given 1-minute period.
(c) Find the probability that at least two calls will arrive in a given 2-minute period.
3.10P The National Maximum Speed Limit (NMSL) of 55 miles per hour has been in force in
the United States since early 1974. The benefits of this law have been studied by D.B.
Kamerud (Transportation Research, 17A, no. 1, 1983, pp. 51–64), who reports that the
fatality rate for interstate highways with the NMSL in 1975 is approximately 16 per 109
vehicle miles
(a) Find the probability of at most 15 fatalities occurring in 109 vehicle miles.
(b) Find the probability of at least 20 fatalities occurring in 109 vehicle miles. (Assume
that the number of fatalities per vehicle mile follows a Poisson distribution.)
ASSESSMENT
3.1A Forty percent of seeds from maize (modern-day corn) ears carry single spikelets, and
the other 60% carry paired spikelets. A seed with single spikelets will produce an ear
with single spikelets 29% of the time, whereas a seed with paired spikelets will
produce an ear with single spikelets 26% of the time. Consider randomly selecting
ten seeds. (20 points)
(a) What is the probability that exactly five of these seeds carry a single
spikelet and produce an ear with a single spikelet?
(b) What is the probability that exactly five of the ears produced by these
seeds have single spikelets? What is the probability that at most five ears
have single spikelets?
3.2A The manufacturer of a low-calorie dairy drink wishes to compare the taste appeal of a
new formula (B) with that of the standard formula (A). Each of four judges is given
three glasses in random order, two containing formula A and the other containing
formula B. Each judge is asked to state which glass he or she most enjoyed.
Suppose the two formulas are equally attractive. Let Y be the number of judges
stating a preference for the new formula. (30 points)
(a) Find the probability function for Y.
(b) What is the probability that at least three of the four judges state a
preference for the new formula?
(c) Find the expected value of Y.
(d) Find the variance of Y.
3.3A Of all customers purchasing automatic garage-door openers, 75% purchase a chain-
driven model. Let number among the next 15 purchasers who select the chain-driven
model. (30 points)
(a) What is the probability mass function (pmf) of �?
(b) Compute �(� > 10)
(c) Compute �(6 ≤ � ≤ 10)
(d) Compute � and �2 .
(e) If the store currently has in stock 10 chain-driven models and 8 shaft-
driven models, what is the probability that the requests of these 15 customers
can all be met from existing stock?
3.4A The probability that a single radar set will detect an airplane is 0.9. If we have five
radar sets, what is the probability that exactly four sets will detect the plane? At least
one set? (Assume that the sets operate independently of each other.) (20 points)
SUPPLEMENTARY KNOWLEDGE
For additional information, you may view the following videos below:
1. Random Variables and their Probability Distributions
(https://www.youtube.com/watch?v=0P5WRKihQ4E)
2. Cumulative Distribution Functions (https://www.youtube.com/watch?v=3xAIWiTJCvE)
3. Expected Values of Random Variables
(https://www.youtube.com/watch?v=OvTEhNL96v0)
4. The Binomial Distribution (https://www.youtube.com/watch?v=qIzC1-9PwQo)
5. The Poisson Distribution (https://www.youtube.com/watch?v=jmqZG6roVqU)
6. Example problems (Binomial and Poisson with Geometric and Hypergeometric
(https://www.youtube.com/watch?v=Jm_Ch-iESBg)
ANSWER KEY
Answers to Practice Problems:
3.1P 3.2P
3.7P (a) 0.5136 (b) 0.1808 (c) 0.9728 (d) 0.8 (e) 0.64
3.8P (a) 0.537 (b) 0.098
3.9P (a) 0.0183 (b) 0.908 (c) 0.997
4 1 2 1 4 8
3.2A (a) � � = �
(3 )�(3 )4−� (b) 9 (c) 3 (d) 9
3.3A (a) b(x; 15, 0.75) (b) 0.686 (c) 0.313 (d) 11.25, 2.81 (e) 0.310
References