Empirical Probability Distribution

Fundamentals Of Business Analytics 1
Empirical Probability Distribution
Module 006 Empirical Probability Distribution

At the end of this module, you will be able to:
1. Understand the concept of Empirical Probability Distribution
2. Learn what is Discrete Empirical Probability Distribution
3. Understand discuss what the Continuous Empirical Probability
Distribution is.
INTRODUCTION
The objective of this module is to demonstrate how to convert data into probabilities to solve
managerial decisions. Real historical (empirical) data does not necessarily fit a known
distribution, however these data frequencies and rankings can be used to estimate the
appropriate empirical probability distribution. Later, the empirical distributions are used in
decision trees and simulations to make optimum managerial decisions.
Empirical probability uses the number of occurrences of an outcome within a sample set as a
basis for determining the probability of that outcome. The number of times "event X" happens
out of 100 trials will be the probability of event X happening. An empirical probability is closely
related to the relative frequency of an event. An empirical distribution is one for which each
possible event is assigned a probability derived from experimental observation. It is assumed
that the events are independent and the sum of the probabilities is 1.
Empirical probability, also called experimental probability, is the probability your experiment
will give you a certain result. For example, you could toss a coin 100 times to see how many
heads you get, or you could perform a taste test to see if 100 people preferred cola A or cola B.
You could use this information to make an educated guess (a statistic) about what your
probabilities would be if you performed the experiments 1000, 10,000 or even an unlimited
number of times. If you don’t actually perform the experiment—if you just theorize about it—
then that’s called theoretical probability.
Empirical Probability is probability based upon data. That data can be either the result of a
designed experiment (experimental data) or the result of situations that occur beyond the
control of the analyst (observational data). In the fields of medicine and business, data-driven
probability is referred to as “Evidence-based” probability.
Course Module
In order for a theory to be proved or disproved, empirical evidence must be collected. An
empirical study will be performed using actual market data. In finance for example, many
empirical studies have been conducted on the capital asset pricing model (CAPM), and the results
are slightly mixed.
In some analyses, the model does hold in real world situations, but most studies have disproved
the model for projecting returns. Although the model is not completely valid, that is not to say
that there is no utility associated with using the CAPM. For instance, the CAPM is often used to
estimate a company's weighted average cost of capital.
An empirical distribution may represent either a continuous or a discrete distribution. If it

represents a discrete distribution, then sampling is done “on step”. If it represents a continuous
distribution, then sampling is done via “interpolation”. The way the table is described usually
determines if an empirical distribution is to be handled discretely or continuously.
discrete description continuous description

value probability value probability
-
10 .10 0-10 .10
-
20 .15 10 - 20 .15
-
35 .40 20 – 35 .40
-
40 .30 35 – 40 .30
-
60 .05 40 -60 .05
Table: 1 Table: 2
DISCRETE EMPIRICAL PROBABILITY DISTRIBUTIONS
Recall that f(x) is the probability of a specific outcome x, that is, the probability of a specific value
of a random variable. Discrete empirical probability can be calculated by counting the number of
occurrences of each outcome (numeric or otherwise):
f(x) = P(x) = n(x)

n
where n(x) is the number of data points equal to the value x and n is the total number of data
points (sample size).
f(x) = Probability of an event x = No. of times the event x happened

No. of times the event could have happened
EXAMPLE: DISCRETE EMPIRICAL PROBABILITY DISTRIBUTION

A human resources analyst is examining the potential financial implications of employees
choosing retirement plans. The company has four retirement plans: A, B, C, and D. A sample of 25
employees and the plans they have selected is shown in Table 3. Construct the distribution for
the choice of retirement plans.
Obs Plan Obs Plan Obs Plan

1 B 11 C 21 C
2 C 12 D 22 C
3 C 13 B 23 B
4 C 14 A 24 B
5 C 15 C 25 C
6 C 16 D
7 D 17 C
8 B 18 C
9 D 19 A
10 B 20 D
Table 3: Sample Data for Retirement Plan Selection
It is useful to first sort the data. The frequencies and probabilities are readily computed after
sorting as in the figure below.
Figure 1: Example of Discrete Empirical Probability Distribution
Course Module
As would be expected due to the Law of Large Numbers, the accuracy of this method of
determining discrete probability improves for larger samples.
Examining the previous spreadsheet reveals that there are two methods by which a set of
empirical data may be used to generate random variables:
1. Using the full list of data: Give each element a 1/n probability of selection. The data can be
first sorted. Sorted data provides the analyst a better understanding of the likelihoods of the
various outcomes, this in turn, provides the analyst with a much better understanding of the
data.
2. Using the data distribution: This works well when there are not an overly cumbersome
number of levels of the discrete variable.
The spreadsheet shown in Fig. 2 demonstrates how the discrete example could be simulated
using the full list of data.
Figure 2: Simulation of Discrete Data Using Full List of Data
The spreadsheet shown in Fig. 3 is an example how the discrete empirical data could be
simulated using the probability distribution of the data computed from the previous example.
Figure 3: Simulation of Discrete Data Using the Probability Distribution of the Data
The second method is fundamentally the same as the first, but takes advantage of the way
VLOOKUP works when using an approximate match for data in which the data key (first column
of the data) is sorted from smallest to largest. Compare the two methods mentioned previously
to note that the data distribution method is the full list of data method with the repeated
outcomes removed.
CONTINUOUS EMPIRICAL PROBABILITY DISTRIBUTIONS
With continuous empirical data f(x) can be calculated using the cumulative distribution function
(cdf), F(x). When calculating probabilities from historical data, F(x) is called the Empirical
Cumulative Distribution Function and is abbreviated as ECDF(x). The ECDF(x) is easily calculated
by first sorting the data from smallest to largest and then using the frequency counts to
determine the cumulative probability:
ECDF(x) = F(x) = P(X ≤ x)/n

where n(X ≤ x) is the number of data points less than or equal to the value x and n is the total
number of data points (sample size).
Course Module
EXAMPLE: EMPIRICAL DISTRIBUTION IN A DECISION TREE: PRICING DECISIONS
A company is bidding to supply parts to an electronics manufacturer. The competitors’ bids for
10 previous similar contracts are shown in Table 4. If the bid is won, the total cost of completing
the contract is $350,000. What is the optimum bid?
Obs Bid Obs Bid Obs Bid

1 369,800 5 387,300 9 401,400
2 403,200 6 404,800 10 380,300
3 401,800 7 389,700
4 387,600 8 407,700
Table 4: Empirical Probability Distribution for Bidding Example
Consider the abbreviated generic bidding decision tree in Fig. 4.
Figure 4: Abbreviated Generic Decision Tree for Bidding Example
As the electronics manufacturer will purchase the least expensive components, then low bid wins
in this situation. Because low bid wins, then the probability of winning given a specific bid is:
P(Win ∣ Bid) = 1 – f (Bid) = 1 - ECDF

Based on the decision tree, the expected value for a given bid is calculated:
EVBid = P(Win ∣ Bid)(Bid - $350,000)+(1- P(Win ∣ Bid))($0)

= P(Win ∣ Bid)(Bid - $350,000)
To compute P(Win | Bid), first calculate the ECDF. The ECDF is computed by first sorting the data
from largest to smallest, then calculating the number of data points less than or equal to each
data point, and finally dividing those results by the sample size:
ECDF(x) = F(x) = P(X ≤ x)

= n(X ≤ x)/n
The ECDF is shown in figure 5.
Figure 5: Empirical Cumulative Distribution Function (ECDF) for the Bidding Example
The probabilities of winning is then calculated as P(Win | Bid) = 1 – ECDF. Thus, for LOW BID
WINS bidding, the probability of winning is 1 − CDF. Conversely, for HIGH BID WINS bidding the
probability of winning is the CDF.
Course Module
Figure 6: Probability of Winning Given a Specific Bid (1 − ECDF) for the Bidding Example
From the ECDF, the slopes and intercepts to calculate the probability of winning given a specific
bid using interpolation can be calculated using the method shown in Table 5.
Intercept =
P(Win | Slope =
Rank Obs ECDF Bid ECDF− Slope ×
Bid) ∆P(Win|Bid)/ ∆Bid
Bid
1 1 0.10 369,800 0.90 −0.0000095 4.42
2 10 0.20 380,300 0.80 −0.0000143 6.23
3 5 0.30 387,300 0.70 −0.0003333 129.80
4 4 0.40 387,600 0.60 −0.0000476 19.06
5 7 0.50 389,700 0.50 −0.0000085 3.83
6 9 0.60 401,400 0.40 −0.0002500 100.75
7 3 0.70 401,800 0.30 −0.0000714 29.00
8 2 0.80 403,200 0.20 −0.0000625 25.40
9 6 0.90 404,800 0.10 −0.0000345 14.06
10 8 1.00 407,700 0.00 −0.0000345 14.06
Table 5: Slope–Intercept Table to Calculate P(Win | Bid) = 1 − ECDF
Using Table 5 and the VLOOKUP function, the expected value for a bid, EVBid, can be calculated
EVBid = P(Win ∣ Bid)(Bid - $350,000)
= (Slope(Bid)+ Intercept)(Bid - $350,000)
The optimum bid is obtained using Excel’s One-Way Data Table command.
Figure 7: Calculation of Optimum Bid Using a One-Way Table
In a manner similar to the method used to simulate the five-point estimate, the ECDF must first
be inverted as shown in Table 6 and corresponding graph shown in figure 8.
Slope =
ECDF = Intercept = Bid −
Rank Bid ∆Bid/
Rand() Slope * ECDF
∆ECDF
1 0.00 359,300 105,000 359,300
2 0.10 369,800 105,000 359,300
3 0.20 380,300 70,000 366,300
4 0.30 387,300 3,000 386,400
5 0.40 387,600 21,000 379,200
6 0.50 389,700 117,000 331,200
7 0.60 401,400 4,000 399,000
8 0.70 401,800 14,000 392,000
9 0.80 403,200 16,000 390,400
10 0.90 404,800 29,000 378,700
1.00 407,700 0 407,700
Table 6: Slope–Intercept Table to Generate Random Bids for Simulation
Course Module
Figure 8: Inverse ECDF for Generating Random Bids for Simulation
As is the case of simulating the five-point estimate, the RAND() must be calculated in a cell that is
external to the cell used to compute the random variable so that the slope and intercept will
correspond to the appropriate percentile specified by the RAND(). Then the VLOOKUP function is
used to determine the appropriate slope and intercept to calculate the random bid that would
correspond to the percentile generated by the RAND().
ADVANTAGES AND DISADVANTAGES
The main advantage of using empirical probability is that the probability is backed by
experimental studies and data. It is free from assumed data or hypotheses. However, there are
two big disadvantages of empirical probability to consider:
• Drawing incorrect conclusions

Using empirical probability can cause wrong conclusions to be drawn. For example, we know
that the chance of getting a head from a coin toss is ½. However, an individual may toss a coin
three times and get heads in all tosses. He may draw an incorrect conclusion that the chances of
tossing a head from a coin toss are 100%.
• Insufficient sample size

Small sample sizes reduce accuracy. Therefore, large sample sizes are generally used for
empirical probability to attain a good probability representation. For example, if an individual
wanted to know the probability of getting a head in a coin toss but only used one sample, the
empirical probability would be either 0% or 100%.
Books and Journals
Pinder, J. (2017). Introduction to Business Analytics Using Simulation, 125 London Wall, London EC2Y
5AS, United Kingdom
Schniederjans, M. (2017), Business Analytics Principles, Concepts, and Applications, Pearson Education,
Inc, Upper Saddle River, New Jersey 07458
https://www.investopedia.com/
https://www.managementstudyguide.com/
https://www.statisticshowto.com/experimental-empirical-probability/
Course Module

Empirical Probability Distribution

Uploaded by

Copyright:

Available Formats

Empirical Probability Distribution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Empirical Probability Distribution

Uploaded by

Copyright:

Available Formats

Fundamentals Of Business Analytics 1

Empirical Probability Distribution

Module 006 Empirical Probability Distribution

An empirical distribution may represent either a continuous or a discrete distribution. If it

discrete description continuous description

DISCRETE EMPIRICAL PROBABILITY DISTRIBUTIONS

f(x) = P(x) = n(x)

f(x) = Probability of an event x = No. of times the event x happened

EXAMPLE: DISCRETE EMPIRICAL PROBABILITY DISTRIBUTION

Obs Plan Obs Plan Obs Plan

Table 3: Sample Data for Retirement Plan Selection

Figure 1: Example of Discrete Empirical Probability Distribution

Figure 2: Simulation of Discrete Data Using Full List of Data

CONTINUOUS EMPIRICAL PROBABILITY DISTRIBUTIONS

ECDF(x) = F(x) = P(X ≤ x)/n

Obs Bid Obs Bid Obs Bid

Table 4: Empirical Probability Distribution for Bidding Example

Consider the abbreviated generic bidding decision tree in Fig. 4.

Figure 4: Abbreviated Generic Decision Tree for Bidding Example

P(Win ∣ Bid) = 1 – f (Bid) = 1 - ECDF

EVBid = P(Win ∣ Bid)(Bid - $350,000)+(1- P(Win ∣ Bid))($0)

ECDF(x) = F(x) = P(X ≤ x)

Figure 7: Calculation of Optimum Bid Using a One-Way Table

ADVANTAGES AND DISADVANTAGES

• Drawing incorrect conclusions

• Insufficient sample size

Books and Journals

You might also like