Further Statistical Issues: Simulation With Arena, 3 Ed

Further
Statistical
Issues
Chapter 12
Last revision June 9, 2003
Simulation with Arena, 3rd ed. Chapter 12 – Further Statistical Issues Slide 1 of 39
What We’ll Do ...
• Random-number generation
• Generating random variates
• Nonstationary Poisson processes
• Variance reduction
• Sequential sampling
• Designing and executing simulation experiments
Backup material:
•Appendix C: A Refresher on Probability and Statistics
•Appendix D: Arena’s Probability Distributions
Random-Number Generators (RNGs)
• Algorithm to generate independent, identically
distributed draws from the continuous UNIF (0, 1)
distribution f(x)
 These are called random numbers 1
in simulation 0 1 x
• Basis for generating observations from all other

distributions and random processes
 Transform random numbers in a way that depends on the
desired distribution or process (later in this chapter)
• It’s essential to have a good RNG
• There are a lot of bad RNGs — this is very tricky
 Methods, coding are both tricky
The Nature of RNGs
• Recursive formula (algorithm)
 Start with a seed (or seed vector)
 Do something weird to the seed to get the next one
 Repeat … generate the same sequence
 Will eventually repeat (cycle) … want long cycle length
• Not really “random” as in unpredictable
 Does this matter? Philosophically? Practically?
• Want to “design” RNGs
 Long cycle length
 Good statistical properties (uniform, independent) -- tests
 Fast
 Streams (subsegments) – many and long (for variance
reduction … later)
• This is not easy!
 Doing something weird isn’t enough
Linear Congruential Generators
(LCGs)
• The most common of several different methods
 But not the one in Arena (though it’s related) … more later
• Generate a sequence of integers Z1, Z2, Z3, … via the
recursion
Zi = (a Zi–1 + c) (mod m)
• a, c, and m are carefully chosen constants
• Specify a seed Z0 to start off
• “mod m” means take the remainder of dividing by m
as the next Zi
• All the Zi’s are between 0 and m – 1
• Return the ith “random number” as Ui = Zi / m
Example of a “Toy” LCG
• Parameters m = 63, a = 22, c = 4, Z0 = 19:
Zi = (22 Zi–1 + 4) (mod 63), seed with Z0 = 19
i 22 Zi–1+4 Zi Ui
0 19 • Cycling — will repeat forever
1 422 44 0.6984
2 972 27 0.4286
• Cycle length  m
3 598 31 0.4921 (could be << m depending
4 686 56 0.8889
: : : :
on parameters)
61 158 32 0.5079 • Pick m BIG
62 708 15 0.2381
63 334 19 0.3016
• But that might not be enough
64 422 44 0.6984 for good statistical properties
65 972 27 0.4286
66 598 31 0.4921
: : : :
Issues with LCGs
• Cycle length: < m
 Typically, m = 2.1 billion (= 231 – 1) or more
– Which used to be a lot … more later
 Other parameters chosen so that cycle length = m or m – 1
• Statistical properties
 Uniformity, independence
 There are many tests of RNGs
– Empirical tests
– Theoretical tests — “lattice” structure (next slide …)
• Speed, storage — both are usually fine
• Must be carefully, cleverly coded — BIG integers
• Reproducibility — streams (long internal
subsequences) with fixed seeds
Issues with LCGs (cont’d.)
• “Regularity” of LCGs (and other kinds of RNGs):
For the earlier “toy” LCG …
Plot of Ui vs. i Plot of Ui+1 vs. Ui
“Random Numbers
Fall Mainly in the
Planes”
— George Marsaglia
• “Design” RNGs: dense lattice in high dimensions

• Other kinds of RNGs — longer memory in
recursion, combination of several RNGs
The Original (1983) Arena RNG
• LCG with: m = 231 – 1 = 2,147,483,647
a = 75 = 16,807
c=0
• Cycle length = m – 1 = 2.1  109
• Ten different automatic streams with fixed seeds
• In its day, this was a good, well-tested generator
in an efficient code
• But current computer speed make this cycle
length inadequate
 Exhaust in 10 minutes on 2 GHz PC if we just generate
• Can get this generator (not recommended)
 Place a Seeds module (Elements panel) in your model
The Current (2000) Arena RNG
• Uses some of the same ideas as LCG
 Modulo division, recursive on earlier values
• But is not an LCG
 Combines two separate component generators
 Recursion involves more than just the preceding value
• Combined multiple recursive generator (CMRG)
An = (1403580 An-2 – 810728 An-3) mod 4294967087 Two
simultaneous
Bn = (527612 Bn-1 – 1370589 Bn-3) mod 4294944443 recursions
Zn = (An – Bn) mod 4294967087 Combine the two

Zn / 4294967088 if Zn > 0 The next
Un = random
4294967087 / 4294967088 if Zn = 0 number
Seed = a six-vector of first three An’s, Bn’s
The Current (2000) Arena RNG –
Properties
• Extremely good statistical properties
 Good uniformity in up to 45-dimensional hypercube
• Cycle length = 3.1  1057
 To cycle, all six seeds must match up
 On 2 GHz PC, would take 2.8  1040 millennia to exhaust
 Under Moore’s law, it will be 216 years until this generator
can be exhausted in a year of nonstop computing
• Only slightly slower than old LCG
 And RNG is usually a minor part of overall computing time
The Current (2000) Arena RNG –
Streams and Substreams
• Automatic streams and substreams
 1.8  1019 streams of length 1.7  1038 each
 Each stream further divided into 2.3  1015 substreams of
length 7.6  1022 each
– 2 GHz PC would take 669 million years to exhaust a substream
• Default stream is 10 (historical reasons)
 Also used for Chance-type Decide module
• To use a different stream, append its number
after a distribution’s parameters
 For example, EXPO(6.7, 4) to use stream 4
• When using multiple replications, Arena
automatically advances to next substream in
each stream for the next replication
 Helps synchronize for variance reduction
Generating Random Variates
• Have: Desired input distribution for model (fitted
or specified in some way), and RNG (UNIF (0, 1))
• Want: Transform UNIF (0, 1) random numbers
into “draws” from the desired input distribution
• Method: Mathematical transformations of
random numbers to “deform” them to the desired
distribution
 Specific transform depends on desired distribution
 Details in online Help about methods for all distributions
• Do discrete, continuous distributions separately
Generating from Discrete Distributions
• Example: probability
mass function
–2 0 3
• Divide [0, 1]
 Into subintervals of length
0.1, 0.5, 0.4
 Generate U ~ UNIF (0, 1)
 See which subinterval it’s in
 Return X = corresponding
value
Discrete Generation: Another View
• Plot cumulative distribution function; generate U
and plot on vertical axis; read “across and down”
• Inverting
the CDF
• Equivalent
to earlier
method
Generating from Continuous
Distributions
• Example: EXPO (5) distribution
Density (PDF)
Distribution (CDF)
• General algorithm (can be rigorously justified):

1. Generate a random number U ~ UNIF(0, 1)
2. Set U = F(X) and solve for X = F–1(U)
 Solving analytically for X may or may not be simple (or
possible)
 Sometimes use numerical approximation to “solve”
Generating from Continuous
Distributions (cont’d.)
• Solution for EXPO (5) case:
Set U = F(X) = 1 – e–X/5
e–X/5 = 1 – U
–X/5 = ln (1 – U)
X = – 5 ln (1 – U)
• Picture (inverting the CDF, as in discrete case):
Intuition (garden hose):
More U’s will hit F(x)
where it’s steep
This is where the density
f(x) is tallest, and we
want a denser
distribution of X’s
Nonstationary Poisson Processes
• Many systems have externally originating events
affecting them — e.g., arrivals of customers
• If process is stationary over time, usually specify
a fixed interevent-time distribution
• But process could vary markedly in its rate
 Fast-food lunch rush
 Freeway rush hours
• Ignoring nonstationarity can lead to serious
model and output errors
• Already seen this — automotive repair shop,
Chapter 5
Nonstationary Poisson Processes –
Definition
• Usual model: nonstationary Poisson process:
 Have a rate function (t)
 Number of events in [t1, t2] ~ Poisson with mean
(t)
• Issues:
 How to estimate rate function?
 Given an estimate, how to generate during simulation?
Estimating the Rate Function
• Estimation of the rate function
 Probably the most practical method is piecewise constant
– Decide on a time interval within which rate is fixed
– Estimate from data the (constant) rate during each interval
– Be careful to get the units right
 Other (more complicated) methods exist in the literature
Generation
• Arena has a built-in method to generate, assuming
a piecewise-constant rate function
 Arrival Schedule in Create module – auto repair (Model 5-2)
• Method is to invert a rate-one stationary Poisson
process against the cumulate rate function 
 Similar to inverting CDF for continuous random-variable
generation
 Exploits some speed-up possibilities
 Details in Help topic “Non-Stationary Exponential
Distribution”
• Alternative method: “thinning” of a stationary
Poisson process at the peak rate
Variance Reduction
• Random input  random output (RIRO)
• In other words, output has variance
 Higher output variance means less precise results
 Would like to eliminate or reduce output variance
– One (bad) way to eliminate: replace all input random variables by
constants (like their mean)
– Will get rid of random output, but will also invalidate model
– Thus, best hope is to reduce output variance
• Easy (brute-force) variance reduction: just
simulate some more
 Terminating: additional replications
 Steady-state: additional replications or a longer run
Variance Reduction (cont’d.)
• But sometimes can reduce variance without more
runs — free lunch (?)
• Key: unlike physical experiments, can control
randomness in computer-simulation experiments
via manipulating the RNG
 Re-use the same “random” numbers either as they were, in
some opposite sense, or for a similar but simpler model
• Several different variance-reduction techniques
 Classified into categories — common random numbers,
antithetic variates, control variates, indirect estimation, …
 Usually requires thorough understanding of model, “code”
 Will look only at common random numbers in detail
Common Random Numbers (CRN)
• Applies when objective is to compare two (or
more) alternative configurations or models
 Interest is in difference(s) of performance measure(s)
across alternatives
 Model 7-2 (small mfg. system), total avg. WIP output —
two alternatives
A. Base case (as is)
B. 3.5% increase in business (interarrival-time mean falls from 13 to
12.56 minutes)
 Same run conditions, but change model into Model 12-1:
– Remove Output File Total WIP History.dat
– Add entry to Statistic module to compute and save to a .dat file the
total avg. WIP on each replication
The “Natural” Comparison
• Run case A, make the change to get to case B and run
it, then Compare Means via Output Analyzer:
• Difference is not statistically significant

 Were the runs of A and B statistically independent?
 Did we use the same random numbers running A and B?
 Did we use the same random numbers intelligently?
CRN Intuition
• Get sharper comparison if you subject all
alternatives to the same “conditions”
 Then observed differences are due to model differences
rather than random differences in the “conditions”
 Small mfg. system: For both A and B runs, cause:
– The “same” parts arrive at the same times
– Be assigned same attributes (job type)
– Have the same process times at each step
• Then observed differences will be attributable to
system differences, not random bounce
 There isn’t any random bounce
Synchronization of Random Numbers
in CRN
• Generally, get CRN by using the same RNG, seed,
stream(s) for all alternatives
 Already are using the same stream, default = stream 10
 But its usage generally gets mixed up across alternatives
• Must use the same random numbers for the same
purposes across the alternatives —
synchronization of random-number usage
 Usually requires some work, understanding of model
 Usually use different streams in the RNG
 Usually different ways to do this in a given model
 Sometimes can’t synchronize completely for complex models
— settle for partial synchronization
Synchronization of Random Numbers
in CRN (cont’d.)
• Synchronize by source of randomness (we’ll do)
 Assign stream to each point of variate generation
– Separate random-number “faucets” … extra parameter in r.v. calls
– Model 12-1: 14 sources of randomness, separate stream for each
(see book for details), modify into Model 12-2
 Fairly simple but might not ensure complete
synchronization; still usually get some benefit from this
• Synchronize by entity (won’t do — see Exercises)
 Pre-generate every possible random variate an entity might
need when it arrives, assign to attributes, used downstream
 Better synchronization insurance but uses more memory
• Across replications, RNG automatically goes to
next substream within each stream
 Maintains synchronization if alternatives disagree on
number of random numbers used per stream per replication
Effect of CRN
“Natural”
Comparison
Synchronized
CRN
• CRN was no more computational effort

• Effect here is fairly dramatic, but it will not always be
this strong
 Depends on “how similar” A and B are
CRN Statistical Issues
• In Output Analyzer, Analyze > Compare Means
option, have choice of Paired-t or Two-Sample-t
for c.i. test on difference between means
 Paired-t subtracts results replication by replication — must
use this if doing CRN
 Two-Sample-t treats the samples independently — can use
this if doing independent sampling, often better than Paired-
t
• Mathematical justification for CRN
 Let X = output r.v. from alternative A, Y = output from B
Var(X – Y) = Var(X) + Var(Y) – 2 Cov(X, Y)
= 0 if indep
> 0 with CRN
Other Variance-Reduction Techniques
• For single-system simulation, not comparisons
• Antithetic variates: make pairs of runs
 Use “U’s” on first run of pair, “1 – U’s” on second run of pair
 Take average of two runs: negatively correlated, reducing
variance of average
 Like CRN, must take care to synchronize
• Control variates
 Use internal variate generation to “control” results up, down
• Indirect estimation
 Simulation estimates something other than what you want,
but related to what you want by a fixed formula
Sequential Sampling
• Always try to quantify imprecision in results
 If imprecision is “small enough,” you’re done
 If not, need to do something to increase precision
 Just saw one way: variance-reduction techniques
• Obvious way to increase precision: keep
simulating one more “step” at a time, quit when
you achieve desired precision
 Terminating models: “step” = another replication
– Cannot extend length of replications — that’s part of the model
 Steady-state models:
– “step” = another replication if using truncated replications, or
– “step” = some extension of the run if using batch means
Sequential Sampling —
Terminating Models
• Modify Model 12-2 (small mfg., base case, with
random-number streams) into Model 12-3
• Made 25 replications, get 95% c.i. on expected
average Total WIP as 13.03 ± 1.28
 Suppose the ±1.28 is too big — want to reduce to ±0.5
 Approximate formulas from Sec. 6.3: need 124 or 164 total
replications (depending on which formula) rather than 25
 Instead, just make one more at a time, re-compute c.i., stop
as soon as half-width is less than 0.5
 “Trick” Arena to keep making more replications until c.i.
half-width < tolerance = 0.5
Terminating Models (cont’d.)
• Recall: With > 1 replication, automatically get
cross-replication 95% c.i.’s for expected values in
Category Overview report
• Related internal Arena variables:
 ORUNHALF(Output ID) = half-width of 95% c.i. using
completed replications (Output ID = Avg Total WIP)
 MREP = total number of replications asked for (initially,
MREP = Number of Replications in Run > Setup >
Replication Parameters)
 NREP = replication number now in progress (= 1, 2, 3, …)
• Use, manipulate these variables
• Initially set MREP to huge value in Number of
Replications in Run > Setup > Replication
Parameters
 Keep replicating until we cut off when half-width  0.5
• Add a logic (in a submodel) to sense when done
 Create one “control” entity at beginning of each replication
 Control entity immediately checks to see if:
– NREP  2: This is beginning of 1st or 2nd replication
– ORUNHALF(Avg Total WIP) > 0.5: c.i. on completed replications
is still too big
In either case, keep going with this replication (and the next
one too); control entity is Disposed and takes no action
 If both conditions are false, Control entity Assigns MREP =
NREP to stop after this replication, and is Disposed
• Details
 This overshoots required number of replications by one
 In Assign module setting MREP to NREP, have to select
Type = Other since MREP is a built-in Arena variable
 Results: Stopped with 232 total replications, yielding half
width = 0.49699 (barely less than 0.5)
 Different from earlier number-of-replications approximations
(they’re just that)
• Generalizations
 Precision demands on several outputs
 Relative-width stopping: (half-width) / (pt. estimate) small
Steady-State Models
• If doing truncated-replications approach to
steady-state statistical analysis
 Same strategy as above for terminating models
 Warm-up Period specified in Run > Setup > Replication
Parameters
 Err on the side of too much warmup
– Point-estimator bias is especially dangerous in sequential sampling
– Getting tight c.i. centered in the wrong place
– The tighter the c.i. demand, the worse the coverage probability
Steady-State Models (cont’d.)
• Batch-means approach
 Model 12-4: modification of Model 7-4 (small mfg. system) –
want half-width on E(average WIP) to be < 1
 Keep extending the run to reduce c.i. half-width
 Use automatic run-time batch-means 95% c.i.’s
 Stopping criterion: Terminating Condition field of Run >
Setup > Replication Parameters
– Half-width variables are THALF(Tally ID) or DHALF(Dstat ID)
– For us, condition is DHALF(Total WIP) < 1
 Remove all other stopping devices from model
 If “Insuf” or “Corr” would be returned because of too little
data, half-width variables set to huge value — keep going
 Could demand multiple smallness criteria, relative precision
(use TAVG, DAVG variables for point-estimate denominator)
Designing and Executing Simulation
Experiments
• Think of a simulation model as a convenient
“testbed” or laboratory for experimentation
 Look at different output responses
 Look at effects, interaction of different input factors
• Apply classical experimental-design techniques
 Factorial experiments — main effects, interactions
 Fractional-factorial experiments
 Factor-screening designs
 Response-surface methods, “metamodels”
 CRN is “blocking” in experimental-design terminology
 Process Analyzer (PAN) provides a convenient way to carry
out a designed experiment
– See Chapt. 6 for an example of using PAN for a factorial experiment

Further Statistical Issues: Simulation With Arena, 3 Ed

Uploaded by

Copyright:

Available Formats

Further Statistical Issues: Simulation With Arena, 3 Ed

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Further Statistical Issues: Simulation With Arena, 3 Ed

Uploaded by

Copyright:

Available Formats

Further

Last revision June 9, 2003

 These are called random numbers 1

• Basis for generating observations from all other

• “Design” RNGs: dense lattice in high dimensions

Zn = (An – Bn) mod 4294967087 Combine the two

• General algorithm (can be rigorously justified):

 Other (more complicated) methods exist in the literature

• Difference is not statistically significant

• CRN was no more computational effort

You might also like