Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
35 views

Chapter5 Notes

A probability distribution shows all possible outcomes of a random variable and their probabilities. There are two types: discrete and continuous. Discrete distributions include binomial and Poisson, which assign probabilities to integer outcomes. The binomial distribution calculates the probability of a certain number of successes in fixed number of trials with constant probability of success. The Poisson distribution calculates the probability of a certain number of occurrences within a set time period when the average number of occurrences is known. Both use formulas involving factorsials and exponents to determine probabilities. Examples show how to apply the distributions to calculate probabilities for various scenarios.

Uploaded by

Neo William
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Chapter5 Notes

A probability distribution shows all possible outcomes of a random variable and their probabilities. There are two types: discrete and continuous. Discrete distributions include binomial and Poisson, which assign probabilities to integer outcomes. The binomial distribution calculates the probability of a certain number of successes in fixed number of trials with constant probability of success. The Poisson distribution calculates the probability of a certain number of occurrences within a set time period when the average number of occurrences is known. Both use formulas involving factorsials and exponents to determine probabilities. Examples show how to apply the distributions to calculate probabilities for various scenarios.

Uploaded by

Neo William
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Statistics

Chapter 5: Probability Distributions

A probability distribution is a list of all the possible outcomes of a random variable


and their associate probabilities of occurrence.

1. TYPES OF PROBABILITY DISTRIBUTION


Probability distribution functions can be classified as discrete or continuous.
1. Discrete probability distributions: • binomial distributions
• Poisson distributions
2. Continuous probability distributions: • normal distribution

2. DISCRETE PROBABILITY DISTRIBUTIONS


Discrete probability distributions assumes that the outcomes of a random variable
under study can take only specific (usually integer) values.

⇒ e.g. a maths class can have 1, 2, 3, 4, or 5 (or any integer) number of students
⇒ for a discrete random variable to follow either a binomial or Poisson process, it must
possess a number of specific characteristics

2.1 Binomial Probability Distribution


⇒ 4 conditions to identify a binomial distribution:
1. Random variable is observed n number of times
2. There are only 2, mutually exclusive & collectively exhaustive, outcomes :
success & failure (e.g. a student is absent from class or not absent from class)
3. Each outcome has an associated probability:
• success outcome: p
• failure outcome: 1 – p
4. Objects are assumed to be independent of each other; p is same for each of n
objects

The Binomial Question:


‘What is the probability that 𝓍 successes will occur in a randomly drawn sample of n
objects?’

Binomial probability distribution formula:


P(x) = nCx px(1 – p)n-x for x = 0, 1, 2, 3, . . ., n
Where: n = sample size
x = number of success outcomes
p = probability of a success outcome
(1-p) = probability of failure outcome

Example: Car hire request study


The Zeplin car hire company has a fleet of rental cars that includes the make Opel.
Experience has shown that one in four clients requests to hire an Opel. If five
reservations are randomly selected from today’s bookings, what is the probability that
two clients will have requested an Opel?

1
Solution:
The random variable (nr. of hire requests for an Opel) is discrete → 0, 1, 2, 3, etc. Opels
can be requested for hire on a given day. This discrete random variable follows the
binomial process, because it satisfies the 4 conditions:
1. Random variable is observed 5 times: n = 5
2. There are 2 possible outcomes:
• a client requests to hire an Opel (success outcome)
• a client requests to hire another car, i.e. not an Opel (failure outcome)
3. Probability of success outcome is constant: ‘experience has shown that one in
four clients requests to hire an Opel’
• p (probability that a client requests to hire an Opel car) = ¼ = 0.25
• (1 – p) (probability that client does not request an Opel) = 0.75
4. The trials are independent: p will not change from one client’s request to
another

Find: P(x=2) when n = 5 and p = 0.25


P(x=2) = 5C2(0.25)2 (1 – 0.25)5-2 = (10)(0.0625)(0.4219) = 0.264
There is a 26.4% chance that 2 out of 5 randomly selected clients will request an Opel.

Example: Life assurance policy surrender study


Global Insurance has found that 20% (one in five) of all insurance policies are
surrendered (cashed in) before their maturity date. Assume that 10 policies are
randomly selected from the company’s policy database.
a) What is the probability that four of these 10 insurance policies will have been
surrendered before their maturity date?
Solution:
Find: P(x=4) when n = 10 and p = 0.2
P(x=4) = 10C4 (0.2)4 (1 – 0.2)10-4 = (210)(0.0016)(0.2621) = 0.088
There is a 8.8% chance that four out of 10 insurance policies will have been
surrendered before their maturity date.

b) What is the probability that no more than three of these 10 insurance policies will
have been surrendered before their maturity date?
Solution:
•‘no more than 3’ implies that either 0 or 1 or 2 or 3 of the sampled policies will have
been surrendered before their maturity date;
•Thus find cumulative probability: P(x≤3) = P(x=0) + P(x=1) + P(x=2) + P(x=3)
•Calculate each separately:
P(x=0) = 10C0 (0.2)0 (1 – 0.2)10-0 = 0.107
P(x=1) = 10C1 (0.0)1 (1 – 0.2)10-1 = 0.269
P(x=2) = 10C2 (0.0)2 (1 – 0.2)10-2 = 0.302
P(x=3) = 10C3 (0.0)3 (1 – 0.2)10-3 = 0.201
Then: P(x≤3) = 0.107 + 0.269 + 0.302 + 0.201 = 0.879
There is a 87.9% chance that no more than 3 of the 10 insurance policies will have been
surrendered before their maturity date.

2
c) What is the probability that at least 2 out of these 10 insurance policies will have
been surrendered before their maturity date?
Solution:
•‘at least 2’ means it can be 2 or more;
•Thus find cumulative probability: P(x≥2) = P(x=2) + P(x=3) + P(x=4) + . . . + P(x=10)
(this means that 9 binomial calculations must be performed)
•To avoid this much calculations, apply the complementary law of probabilities:
Find: P(x≥2) = 1 – P(x≤1)
P(x≥2) = 1 – [P(x=0) + P(x=1)]
= 1 – [0.107 + 0.269] (from (b) above)
= 1 – 0.376 = 0.624
There is a 62.4% chance that at least 2 out of the 10 insurance policies will have been
surrendered before their maturity date.

Descriptive statistical measures of the binomial distribution:


• from above example: p = 0.20 and n = 10

Mean: 𝜇 = np = (10)(0.2) = 2 policies, on average, out of 10 policies


Standard deviation: 𝜎 = √𝑛𝑝(1 − 𝑝) = √(10)(0.2)(0.8) = 1.27 policies

2.2 Poisson Probability Distribution


⇒ is also a discrete process
A Poisson process measures the number of occurrences of a particular outcome of a
discrete random variable in a predetermined time, space of volume interval for which an
average number of occurrences of the outcomes is known or can be determined.

⇒e.g. number of sales made by a telesales person in a week

The Poisson Question:


‘What is the probability of x occurrences of a given outcome being observed in a
predetermined time, space or volume interval?’

Poisson probability distribution formula:


𝑒 ⋋ . ⋋𝑥
P(x) = for x = 0, 1, 2, 3, . . .
𝑥!

Where: ⋋ = mean (average) number of occurrences of a given outcome of the


Random variable for a predetermined time, space or volume interval
e = mathematical constant (± = 2.71828)
𝓍 = number of occurrences of a given outcome for which a probability is
required

Example: Web-based Marketing Study


A web-based travel agency uses its website to market its travel products (holiday
packages). The agency receives an average of five web-based enquiries per day for its
different travel products.
a) What is the probability that, on a given day, the agency will receive only three web-
based enquiries for its travel products?

3
Solution:
Find P(x=3) when ⋋ (average nr. of web-based enquiries per day) = 5
ℯ −5 .53
P(x=3) = = 0.1404
3!

There is a 14,04% chance that the travel agency will receive only 3 web-based enquiries
on a given day when the average number of web-based enquiries per day is 5.

b) What is the probability that, on a given day, the agency will receive at most two web-
based enquiries for travel packages?

Solution:
•‘at most 2’ implies 0 or 1 or 2 enquiries on a given day;
•Thus find cumulative probability: P(x≤2) = P(x=0) + P(x=1) + P(x=2)
ℯ −5 .50
P(x=0) = = 0.00674
0!
ℯ −5 .51
P(x=1) = = 0.0337
1!

ℯ −5 .52
P(x=2) = = 0.0842
2!

P(x≤2) = 0.00674 + 0.0337 + 0.0842 = 0.12464


There is a 12.46% chance that the travel agency will receive at most 2 web-based
enquiries on a given day when the average number of web-based enquiries per day is 5.

c) What is the probability that, on a given day, the agency will receive more than four
web-based enquiries for travel packages?

Solution:
•‘more than 4’ implies 5 or 6 or 7 or more enquiries per day;
Thus P(x > 4) means P(x ≥ 5) = P(x=5) + P(x=6) + P(x=7) + . . . P(x = ∞)
•Apply complementary rule;
thus P(x ≥ 5) = 1 – P(x ≤ 4)
= 1 – [P(x=0) + P(x=1) + P(x=2) + P(x=3) + P(x=4)]
= 1 – [0.00067 + 0.0337 + 0.0842 + 0.1404 + 0.1755]
= 1 – 0.4405
= 0.5595
There is a 55.95% chance that the travel agency will receive more than 4 web-based
enquiries on a given day when the average number of web-based enquiries per day is 5.

d) What is the probability that the travel agency will receive more than four web-based
enquiries for travel packages in any two-day period?

Solution:
•note: time interval for mean/average of one day has changed to two days;
Thus ⋋ = 5 per day, on average, must be adjusted to ⋋ = 10 per two days;
Then find P(x>4) where ⋋ = 10.

4
P(x > 4) means P(x ≥ 5) = P(x=5) + P(x=6) + P(x=7) + . . . P(x = ∞)
•Apply complementary rule;
thus P(x ≥ 5) = 1 – P(x ≤ 4)
= 1 – [P(x=0) + P(x=1) + P(x=2) + P(x=3) + P(x=4)]
= 1 – [0.0000454 + 0.000454 + 0.00227 + 0.00757 + 0.01892]
= 1 – 0.02925
= 0.97075
There is a 97.08% chance that the travel agency will receive more than 4 web-based
enquiries in any two-day period when the average number of web-based enquiries
received is 10 per two-day period.

Descriptive statistical measures of the Poisson distribution:

Mean: 𝜇 = ⋋
Standard deviation: 𝜎 = √⋋

3. CONTINUOUS PROBABILITY DISTRIBUTIONS


⇒ a continuous random variable can take on any value in an interval
⇒ e.g. the length of time to complete a task
⇒ continuous probability distributions are represented by curves for which the area
under the curve between two x-limits represents the probability that x lies within
these limits (or interval)

3.1 Normal Probability Distribution


⇒ most widely used for continuous probability distribution
⇒ the normal probability distribution has the following properties:
• The curve is bell-shaped
• It is symmetrical about a central mean value, 𝜇
• Tails of curve never touch the x-axis
• The distribution is always described by two parameters: a mean (𝜇) and
a standard deviation (𝜎)
• Total area under curve = 1, representing total sample space; because of
symmetry, area under curve < 𝜇 = 0.5 and > 𝜇 = 0.5
• The probability associated with a particular interval of x-values is defined by the
area under the normal distribution curve between the limits x1 and x2

5
3.2 Standard Normal (z) Probability Distribution
⇒ to find the probability that x lies between x1 and x2, it is necessary to find the area
under the bell-shaped curve between these x-limits
⇒ this is done by converting the x-limits into limits that correspond to the z-distribution
for which areas have already been worked out (as given in Table 1, on p.472)
 to use table: the z-limit (to one decimal place) is listed down the left column; and
 the second decimal position of z is shown across the top row
 the value read off at the intersection of the z-limit (to 2 decimal places) is the area
under the standard normal curve
⇒ formula to convert x-value to z-value:
𝓍− 𝜇
z=
𝜎
⇒ the z-value is always the area between the mean (𝜇 z = 0) and a given z-limit

Example: Courier service delivery time study


A courier service company has found that their delivery time of parcels to clients is
normally distributed with a mean of 45 minutes (𝜇 = 45) and a standard deviation of
eight minutes (𝜎 = 8).
What is the probability that a randomly selected parcel:
a) Will take between 45 and 51 minutes to deliver to the client?

Solution:
Find P(45 < x < 51)

• transform x-limits to z-limits:


45−45
x = 45: z = 8 = 0
51−45
x = 51: z = = 0.75
8
• P(45 < x < 51) is equivalent to finding P(0 < z < 0.75) = 0.2734 (from z-table at 0.75)

There is a 27.34% chance that a randomly selected parcel will take between 45 and 51
minutes to deliver to the client.

6
b) Will take between 35 and 45 minutes to deliver to the client?

Solution:
Find P(35 < x < 45)

• transform x-limits to z-limits:


35−45
x = 35: z = 8 = −1.25
45−45
x = 45: z = =0
8
• P(35 < x < 45) is equivalent to finding P(−1.25 < z < 0) = 0.3944 (from z-table at 1.25)

There is a 39.44% chance that a randomly selected parcel will take between 35 and 45
minutes to deliver to the client.

c) Will take between 35 and 51 minutes to deliver to the client?


Solution:
Find P(35 < x < 51); remember that you must always calculate areas from mean to the
x-value; in this case P(35 < x < 45) and P(45 < x < 51); calculate two separate areas;
then add up the two areas

• Thus P(35 < x < 51) = P(35 < x < 45) + P(45 < x < 51)
= 0.3944 + 0.2734 [as calculated in (a) and (b) above]
= 0.6678

7
d) Will take less than 48 minutes to deliver?
Solution:
P(x < 48) is to be found; this is all the values below mean (𝜇 = 45) (i.e. 50% off all
values) → P(x < 45) plus area between 45 and 48 → P(45 < x < 48)

Find P(45 < x < 48)


• transform x-limits to z-limits:
48−45
x = 48: z = 8 = 0.38
• P(45 < x < 48) is equivalent to finding P(0 < z < 0.38) = 0.1480 (from z-table at 0.38)
• Thus P(x < 48) = P(x < 45) + P(45 < x < 48)
= 0.5 + 0.1480
= 0.6480

There is a 64.8% chance that a randomly selected parcel will take less than 48 minutes
to deliver to the client.

e) Will take between 48 and 51 minutes to deliver to the client?


Solution:
Find areas between 45 and 48; and between 45 and 51; then subtract these two areas

• Thus P(48 < x < 51) = P(45 < x < 51) – P(45 x < 48)
= 0.2734 – 0.1480 [calculated in (a) and (d) above]
= 0.1254
There is a 12.54% chance that a randomly selected parcel will take between 48 and 51
minutes to deliver to the client.

8
f) Will take more than 51 minutes to deliver to the client?

Solution:
•P(x > 51) must be found; since areas under the curve can only be found between the
mean and an x-value, first find P(45 < x < 51) = 0.2734 [calculated in (a) above]

•P(x > 45) = 0.5 [50% off all values lies above (to the right of) the mean]
•Thus P(x > 51) = 0.5 – P(45 < x < 51)
= 0.5 – 0.2734
= 0.2266
There is a 22.66% chance that a randomly selected parcel will take more than 51
minutes to deliver to the client.

g) What is the minimum delivery time for the longest 15% of parcels delivered
to clients?
Solution:
This question requires that a specific delivery time, x, be identified such that above this
value the longest/slowest (top) 15% of delivery time of parcels, are found.

• From the z-table, find the z-value that corresponds to an area of 0.15 in the top tail of
the standard normal distribution.
To use the z-table, the appropriate area to read off is 0.5 – 0.15 = 0.35 (i.e. the middle
area between the mean and the x-value). The closest z-value is 1.04.
• Find the x-value associated with the identified z-value of 1.04.
Substitute z = 1.04, 𝜇 = 45, 𝜎 = 8 into the z-transformation formula, and solve x.
𝓍− 𝜇
z=
𝜎
𝓍−45
1.04 = 8
x = 45 + (1.04 × 8) = 45 + 8.32 = 53.32

Thus the longest (slowest) 15% of delivery time of parcels to clients takes at least 53.32
minutes, i.e. 15% of parcels take longer (more) than 53.32 minutes to be delivered.

9
h) What delivery time of parcels separates the lowest 20% of parcels delivered?

Solution:
This question requires that a specific delivery time, x, be identified such that below this
value the lowest/fastest 20% of delivery time of parcels, are found.

• From the z-table, find the z-value that corresponds to an area of 0.20 in the lower tail
of the standard normal distribution.
To use the z-table, the appropriate area to read off is 0.5 – 0.20 = 0.30 (i.e. the middle
area between the mean and the x-value). The closest value in the z-table is 0.84. Since
the required z-value is below its mean, the z-value will be negative. Hence z = −0.84.
• Find the x-value associated with the identified z-value of −0.84.
Substitute z = −0.84, 𝜇 = 45, 𝜎 = 8 into the z-transformation formula, and solve x.
𝓍− 𝜇
z=
𝜎
𝓍−45
−0.84 = 8
x = 45 + (−0.84 × 8) = 45 – 6.72 = 38.28

Thus the lowest (fastest) 20% of delivery time of parcels to clients takes at most 38.28
minutes, i.e. 20% of parcels take quicker/faster (less) than 38.28 minutes to be
delivered.

_________________________________________________________________

10

You might also like