Parallel Monte Carlo
Parallel Monte Carlo
Parallel Monte Carlo
http://ouray.cudenver.edu/~wrroche/Parellel_Processing/ClassProject/Presentation.html
What is Monte Carlo
• The Monte Carlo method is a very
general term. It is used to classify
programs which use random numbers to
approximate solutions to problems.
The Idea’s Spark
• First proposed as a computer algorithm by
Nicolas Metropolis and Stanislaw Ulam,
1949, Las Alamos, while working on the
Manhatten Project.
– While ill he was wondering the probability of
winning at solitaire. It seemed computers of
the time might be able to simulate 100-1000
games and give an accurate estimate.
– He then applied this idea to neutron diffusion
and other complex problems in mathematics
and physics.
Buffon's needle
• Even earlier, in 1777, Mathematician Comte de
Buffon wondered, ‘If a needle were dropped
randomly on a table with parallel lines, what is the
probability that the needle will cross one of the
lines?’
1
1
r 2
( r 2
x ) dx 2 2 outside circle = 2
Ratio of Random points inside circle vs.
2 4
Same method can2be used to integrate very complex functions
≈ 0.7854
Monte Carlo Applications
• Non Random Processes.
– Evaluation of complex integrals.
– Solutions to inverse/non-inverse simultaneous
equations.
– Differential Equations
–…
Components of Monte Carlo
Routines
• Probability Distribution Functions
• Random Number Generator
• Sampling Rule
• Evaluating
• Error Estimation
• Variance Reduction Techniques
• Parallelization and vectorization
Random Number Generator
Properties of good RNG’s
• Reproducible
– To debug programs
– To debug simulation models
– For documentation purposes.
Random Number Generator
Properties of good RNG’s
• Uncorrelated / Unpredictable
0.54
0.504
0.52 0.502
0.50 0.500
0.48 0.498
0.46 0.496
0.46 0.48 0.50 0.52 0.54 0.496 0.498 0.500 0.502 0.5041
Name period
Ranrot 432099 - 184256986
Prime Modulus Linear Congruential Generator 2^64
Multiplicative Lagged Fibonacci 2^81
Marsaglia Generator 2^110
R250 2^250
Combined Multiple Recursive Generator 2^219
Mersenne Twister 2^19937
Random Number Generator
Properties of good RNG’s
• Parallel streams produced on different
processors should be uncorrelated
– Use Different RNG’s on different processors.
– Assign different sub streams of one large
RNG to different processors.
– Use the leapfrog approach.
Random Number Generator
Properties of good RNG’s
• Reproducible
• Uncorrelated / Unpredictable
• Long Period Length.
• Computationally Efficient
• Portable
• Require Limited Memory
• Parallel streams produced on different
processors should be uncorrelated
Examples
Managing Portfolio’s
Cross Section
Charge Transport in Transistors
• Accurate knowledge of current flow
through transistors is required.
• With complex structure of transistors this
can only be done by simulations.
High Level Algorithm
Device Structure
Input
Spatial Computation
Independent Carrier
flight history simulations
Statistics Collection
Monte Carlo Statistics
High Level Algorithm
• Breaks device into
grids and determine Device Structure
Input
– Only constraint on
Monte Carlo Statistics
partitioning is load
balance.
High Level Algorithm
• The results of the
spatial Device Structure
Input
computations are
globally broadcast. Spatial Computation
Independent Carrier
flight history simulations
Statistics Collection
Monte Carlo Statistics
High Level Algorithm
Random
Number Inject Carrier
Device Structure
Random Input
Number Generate Flight
Time
Spatial Computation
Move Carrier
Device Data Broadcast
Independent Carrier
Update flight history simulations
Statistics
Carrier Exit
Device
Load Balancing - Static
• Originally attempted static load balancing.
– Testing m flight histories on each processor
– OK CPU time per flight history is comparable.
• This is not the case for many devices.
• Poorly suited for workstation clusters where
processor loads are unpredictable.
Load Balancing - Dynamic
• One process (the particle manager) maintains
an event queue.
• Each processor sends requests to the PM when
it is ready for another simulation.
• For workstations the PM is spawned as a
separate process.
• On the CM-5 the PM simulates it’s own histories
in addition to assigning tasks.
– Latencies were reduced by using interrupt driven
active messages.
Load Balancing - Comparison
Device Architecture Time(sec.)
Workstations 406.2
n+nn+
CM-5 (dynamic) 594.7
(2x128x2)
CM-5 (Static) 728.9
Workstations 469.4
2D MOSFET
CM-5 (dynamic) 735.2
(64x64x2)
CM-5 (Static) 807.7
Workstations 502.6
3D MOSFET
CM-5 (dynamic) 759.7
(64x26x4)
CM-5 (Static) 936.3