Simulation Input Data Analysis
Simulation Input Data Analysis
Engineering and
Technology
Management
Simulation-4
INPUT MODELNG
Asl Sencer
STEPS OF NPUT MODELNG
1) Collect data from real system of interest
Requires substantial time and effort
Use expert opinion in case of no sufficient data
2) Identify a probability distribution to represent the
input process
Draw frequency distribution, histograms
Choose a family of theoretical distribution
3) Estimate the parameters of the selected distribution
4) Apply goodness-of-fit tests to evaluate the chosen
distribution and the parameters
Chi-square tests
Kolmogorov Smirnov Tests
5) If these tests are not justified, choose a new
theoretical distribution and go to step 3! If all
theoretical distributions fail, then either use 2
emprical distribution or recollect data.
STEP 1: DATA COLLECTON NCLUDES
LOTS OF DFFCULTES
0
0 1 2 3 4 5 6 7 8 9 10 11
STEP 2.1: IDENTFY THE PROBABLTY
DSTRBUTON Raw Data
79.919 3.081 0.062 1.961 5.845
Histogram with Continuous Data 3.027
6.769
6.505
59.899
0.021
1.192
0.013
34.760
0.123
5.009
Component Life 18.387 0.141 43.565 24.420 0.433
(days) Frequency 144.695 2.663 17.967 0.091 9.003
[0-3) 23 0.941 0.878 3.148 2.157 7.579
[3-6) 10 0.624 5.380 3.371 7.078 23.960
[6,9) 5 0.590 1.928 0.300 0.002 0.543
7.004 31.764 1.005 1.147 0.219
[9-12) 1
3.217 14.382 1.008 2.336 4.562
[12-15) 1
[15-18) 2 Histogram of Component Life
[18-21) 0
25
[21-24) 1
[24-27) 1
[27-30) 0 20
[30-33) 1
[33-36) 1 15
... ...
[42-45) 1 Frequency
... ... 10
[57-60) 1
... ... 5 5
[78-81) 1
... ...
[144-147) 1 0
3 6 9 12 15 18 21 24 27 30 33 36
STEP 2.2: SELECTNG THE FAMLY OF
DSTRBUTONS
9
If the time betweeen successive events is exponential,
then the number of events in a fixed time intervals
is poisson.
10
11
Applications of Beta Distribution:
14
Applications of Gamma Distribution
15
Same as Erlang distribution when the shape
parameter
is an integer.
Applications of Johnson Dist.
Flexible domain being bounded or
unbounded 16
allows it to fit many data sets.
If >0, the domain is bounded
If <0, the domain is unbounded
Applications of Lognormal Distribution
Used to represent quantities which is the product
of large number of random quantities
Used to represent task times which are skewed
17 to
right. If X~LOGN( l , l ), then lnX ~NORM(,)
18
Applications of Weibull Distribution
19
Applications of
Continuous
Empirical
Distribution
Used to incorporate
empirical data as an
alternative to
theoretical
distribution, when
there are
multimodes,
significant outliers,
etc.
20
Applications of
Discrete Empirical
Distribution
0 12 6 7
1 10 7 5
2 19 8 5
3 17 9 3
24
4 10 10 3
5 8 11 1
STEP 3: ESTMATE THE PARAMETERS OF
THE SELECTED DSTRBUTON
If the data are discrete or continuous and have been
grouped in class intervals, i.e., the raw data are not known,
then
[18-21) 0 [42-45) 1
STEP 3: ESTMATE THE PARAMETERS OF
THE SELECTED DSTRBUTON
26
STEP 4: GOODNESS OF FT TEST
Hypothesis test:
Ho: The random variable X conforms to the theoretical distribution
with the estimated parameters
Ha: The random variable does NOT conform to the theoretical
distribution with the estimated parameters
We need a test statistic to either reject or fail to reject Ho. This test
statistic should measure the discrepency between the theoretical and
the emprical distribution.
If this test statistic is high, then Ho is rejected,
Otherwise we fail to reject Ho! (Hence we accept Ho) 28
STEP 4: GOODNESS OF FT TESTS
CH SQUARE TEST
Test statistic:
Arrange n observations into a set of k class intervals or cells.
The test statistic is given by
29
STEP 4: GOODNESS OF FT TESTS
CH SQUARE TEST
Recommendations for number of class intervals
for continuous data
Sample Size, Number of Class Intervals
n k
20 Do not use chi-square test
50 5-10
100 10 to 20
>100 to n/5
(k-s-1)
2
,1
32
STEP 4: GFT - CH SQUARE TEST
EX: POSSON DSTRBUTON
Consider the discrete data we analyzed in step 2.
Ho: # arrivals, X~ Poisson (=3.64)
Ha: ow
is the mean rate of arrivals, =3.64
The following probabilities are found by using the pmf
P(0)=0.026 P(6)=0.085
P(1)=0.096 P(7)=0.044
P(2)=0.174 P(8)=0.020
P(3)=0.211 P(9)=0.008
P(4)=0.192 P(10)=0.003
33
P(5)=0.140 P(>11)=0.001
STEP 4: GFT - CH SQUARE TEST
EX: POSSON DSTRBUTON
Calculation of the chi-square test statistic with k-s-1=7-1-1=5
degrees of freedom and =0,05.
34
So, Ho is rejected!
STEP 4: GFT - CH SQUARE TEST
EX: ARENA NPUT ANALYZER
Distribution Summary
Distribution: Normal
Expression: NORM(225, 89)
Square Error: 0.037778 Reject Normal distribution at
5% significance level!
Chi Square Test
Number of intervals = 12
Degrees of freedom =9
Test Statistic = 1.22e+004
Corresponding p-value < 0.005
Data Summary
Number of Data Points = 21547
Min Data Value =2
Max Data Value = 6.01e+003
Sample Mean = 146
Sample Std Dev = 79.5
Histogram Summary
Histogram Range = 2 to 6.01e+003 36
Number of Intervals = 40
STEP 4: GFT - CH SQUARE TEST
EX: ARENA NPUT ANALYZER
Distribution Summary
Distribution: Weibull
Expression: 0.999 + WEIB(94.7, 0.928)
Square Error: 0.002688 Reject Weibull distribution at
5% significance level!
Chi Square Test
Number of intervals = 20
Degrees of freedom = 17
Test Statistic = 838
Corresponding p-value < 0.005
Data Summary
Number of Data Points = 12418
Min Data Value =1
Max Data Value = 1.47e+003
Sample Mean = 108
Sample Std Dev = 135
Histogram Summary
Histogram Range = 0.999 to 1.47e+003
Number of Intervals = 40 37
STEP 4: GOODNESS OF FT TESTS
DRAWBACKS OF CH-SQUARE GFT
The Chi-square test uses the estimates of the parameters
obtained from the sample that decreases the degrees of
freedom.
Chi-square test requires the data to be placed in class
intervals in the continuous distributions where these
classes are arbitrary and affects the value of the chi-square
test statistic.
The distribution of the chi-square test statistic is known
approximately and the power of the test (probability of
rejecting an incorrect theoretical distribution) is sometimes
low.
Hence other GFTs are also needed!
38
STEP 4: GOODNESS OF FT TESTS
KOLMOGOROV-SMRNOV TEST
Useful when the sample sizes are small and when no
parameters are estimated from the sample data.
Compares the cdf of the theoretical distribution, F(x) with the
emprical cdf, SN(x) of the sample of N observations.
Hypothesis test:
Ho: Data follow the selected pdf
Ha: Data do NOT follow the selected pdf
Test Statistic:
The largest deviation, D between F(x) and SN(x).
39
STEP 4: GOODNESS OF FT TESTS
KOLMOGOROV-SMRNOV TEST
Steps of K-S Test:
() = ( () )
# ()
() = =
40
STEP 4: GOODNESS OF FT TESTS
KOLMOGOROV-SMRNOV TEST
If F is discrete = +, , where
+
= max = max
0 0
1
= max 1 = max
0 0
If F is continuous
= max
0
41
STEP 4: GOODNESS OF FT TESTS
KOLMOGOROV-SMRNOV TEST
3. Evaluation
> , ,
, ,
42
STEP 4: GOODNESS OF FT TESTS
EXAMPLE: KOLMOGOROV-SMRNOV TEST
Consider the data:
0.44, 0.81, 0.14, 0.05, 0.93
i 1 2 3 4 5
Since D=0.26 < 0.05,5 = 0.565
0.05 0.14 0.44 0.81 0.93 Ho is not rejected!
= 0.05 0.14 0.44 0.81 0.93 Data are uniform between (0,1)
= / 0.20 0.40 0.60 0.80 1.00
43
/ 0.15 0.26 0.16 - 0.07