A Note On Random Number Generation: Christophe Dutang and Diethelm Wuertz September 2009
A Note On Random Number Generation: Christophe Dutang and Diethelm Wuertz September 2009
A Note On Random Number Generation: Christophe Dutang and Diethelm Wuertz September 2009
September 2009
1
2 OVERVIEW OF RANDOM GENERATION ALGORITMS 2
according to and compute u0 = g(s0 ), Finally, we generally use one of the three types
2. iterate for i = 1, . . . , si = f (si1 ) and ui = of output function:
g(si ).
name k N1 1 128
wA = w << 8 w,
WELL512a 512 225 0
WELL1024a 1024 407 0 32
wB = w >> 11 c, where c is a 128-bit
WELL19937a 19937 8585 4
WELL44497a 44497 16883 7 constant and the bitwise AND operator,
128
wC = w >> 8,
Table 1: Specific WELL generators 32
wD = w << 18,
2.1.5 SIMD-oriented Fast Mersenne
Twister algorithms
128 32
where << denotes a 128-bit operation while >>
A decade after the invention of MT, Matsumoto a 32-bit operation, i.e. an operation on the four
& Saito (2008) enhances their algorithm with the 32-bit parts of 128-bit word w.
computer of today, which have Single Instruction
Mutiple Data operations letting to work concep- Hence the transition function of SFMT is given
tually with 128 bits integers. by
k
X
2.2.1 Quasi-random points and discrep- n= aj p j .
j=1
ancy
Then, we can define the radical-inverse function
Until here, we do not give any example of quasi- of integer n as
random points. In the unidimensional case,
k
an easy example of quasi-random points is the X aj
1 3 p (n) = .
sequence of n terms given by ( 2n , 2n , . . . , 2n1
2n ). pj+1
1 j=1
This sequence has a discrepancy n , see Niederre-
iter (1978) for details. And finally, the Van der Corput sequence is given
by (p (0), p (1), . . . , p (n), . . . ) [0, 1[. First
The problem with this finite sequence is it terms of those sequence for prime numbers 2 and
depends on n. And if we want different points 3 are given in table 2.
numbers, we need to recompute the whole se-
quence. In the following, we will on work the n in p-basis p (n)
first n points of an infinite sequence in order to n p=2 p=3 p=5 p=2 p=3 p=5
use previous computation if we increase n. 0 0 0 0 0 0 0
1 1 1 1 0.5 0.333 0.2
Moreover we introduce the notion of discrep- 2 10 2 2 0.25 0.666 0.4
ancy on a finite sequence (ui )1in . In the above 3 11 10 3 0.75 0.111 0.6
example, we are able to calculate exactly the 4 100 11 4 0.125 0.444 0.8
discrepancy. With infinite sequence, this is no 5 101 12 10 0.625 0.777 0.04
longer possible. Thus, we will only try to estimate 6 110 20 11 0.375 0.222 0.24
asymptotic equivalents of discrepancy. 7 111 21 12 0.875 0.555 0.44
8 1000 22 13 0.0625 0.888 0.64
The discrepancy of the average sequence of
points is governed by the law of the iterated Table 2: Van der Corput first terms
logarithm :
nDn
lim sup = 1, The big advantage of Van der Corput sequence
n+ log log n is that they use p-adic fractions easily computable
which leads to the following asymptotic equivalent on the binary structure of computers.
2 OVERVIEW OF RANDOM GENERATION ALGORITMS 9
where p1 , . . . , pd are pairwise relatively prime which is the same as above for n defined by the
bases. The discrepancy of the Halton sequence aD,i s.
log(n)d
is asymptotically O n .
Finally the (d-dimensional) Faure sequence is
The following Halton theorem gives us better defined by
discrepancy estimate of finite sequences. For any
dimension d 1, there exists an finite sequence (p (a1,1 , . . . , a1,k ), . . . , p (ad,1 , . . . , ad,k )) I d .
of points in I d such that the discrepancy In the bidimensional case, we work in 3-basis, first
log(n)d1 1
terms of the sequence are listed in table 3.
Dn = O .
n
n a13 a12 a11 2 a23 a22 a21 (a13 ..) (a23 ..)
Therefore, we have a significant guarantee there
0 000 000 0 0
exists quasi-random points which are outperform-
1 001 001 1/3 1/3
ing than traditional Monte-Carlo methods.
2 002 002 2/3 2/3
3 010 012 1/9 7/9
4 011 010 4/9 1/9
2.2.4 Faure sequences
5 012 011 7/9 4/9
6 020 021 2/9 5/9
The Faure sequences is also based on the decom- 7 021 022 5/9 8/9
position of integers into prime-basis but they have 8 022 020 8/9 2/9
two differences: it uses only one prime number for 9 100 100 1/27 1/27
basis and it permutes vector elements from one 10 101 101 10/27 10/27
dimension to another. 11 102 102 19/27 19/27
12 110 112 4/27 22/27
The basis prime number is chosen as the small- 13 111 110 12/27 4/27
est prime number greater than the dimension d, 14 112 111 22/27 12/27
i.e. 3 when d = 2, 5 when d = 3 or 4 etc. . . In
the Van der Corput sequence, we decompose Table 3: Faure first terms
integer n into the p-basis:
k
X
n= aj p j .
2.2.5 Sobol sequences
j=1
1, . . . , d). vi,j , generally called direction numbers Equidistribution: The Xi form a point set
are numbers related to primitive (irreducible) with probability 1; i.e. the random- ization
polynomials over the field {0, 1}. process has preserved whatever special prop-
erties the underlying point set had.
In order to generate the jth dimension, we sup-
pose that the primitive polynomial in dimension The Sobol sequences can be scrambled by the
j is Owens type of scrambling, by the Faure-Tezuka
pj (x) = xq + a1 xq1 + + aq1 x + 1. type of scrambling, and by a combination of both.
Then we define the following q-term recurrence The program we have interfaced to R is based
relation on integers (Mi,j )i on the ACM Algorithm 659 described by Bratley
& Fox (1988) and Bratley et al. (1992). Modi-
Mi,j = 2a1 Mi1,j 22 a2 Mi2,j . . .
fications by Hong & Hickernell (2001) allow for
2q1 aq1 Miq+1,j 2q aq Miq,j Miq a randomization of the sequences. Furthermore,
where i > q. in the case of the Sobol sequence we followed the
implementation of Joe & Kuo (1999) which can
This allow to compute direction numbers as handle up to 1111 dimensions.
where (p1 , . . . , pd ) are prime numbers, generally We have the following theorem for good lattice
the first d prime numbers. With the previous points. For every dimension d 2 and integer
inequality, we can derive an estimate of the Torus n 2, there exists a lattice points g Zd which
algorithm discrepancy: coordinates relatively prime to n such that the
discrepancy Dn of points { n1 g}, . . . , { nn g} satisfies
1 + log n
O . d
n
d 1 7
Ds < + + 2 log m .
n 2n 5
> setSeed(1)
> congruRand(10) We can also check around the 10000th term.
From the site http://www.firstpr.com.
au/dsp/rand31/, we know that 9998th to
[1] 7.826369e-06 1.315378e-01 10002th terms of the Park-Miller sequence are
[3] 7.556053e-01 4.586501e-01 925166085, 1484786315, 1043618065, 1589873406,
[5] 5.327672e-01 2.189592e-01 2010798668. The congruRand generates
[7] 4.704462e-02 6.788647e-01
[9] 6.792964e-01 9.346929e-01
> setSeed(1614852353)
> congruRand(5, echo=TRUE)
One can follow the evolution of the nth integer
generated with the option echo=TRUE.
1
see Wichmann & Hill (1982), Marsaglia (1994) and 1 th integer generated : 1614852353
Knuth (2002) for details. 2 th integer generated : 925166085
4 DESCRIPTION OF THE RANDOM GENERATION FUNCTIONS 14
1.0
0.8
It is easier to see the impact of scrambling
by plotting two-dimensional sequence in the unit
0.6
square. Below we plot the default Sobol sequence
v
and Sobol scrambled by Owen algorithm, see
0.4
figure 1.
0.2
> par(mfrow = c(2,1))
> plot(sobol(1000, 2))
0.0
an implementation of Faure sequences. For the
moment, there is no function faure.
0.8
0.6
4.2.4 Torus algorithm (or Kronecker se-
v
quence)
0.4
The function torus implements the Torus algo-
0.2
rithm.
0.0
u
[1] 0.41421356 0.82842712 0.24264069
[4] 0.65685425 0.07106781 0.48528137
Figure 1: Sobol (two sampling types)
[7] 0.89949494 0.31370850 0.72792206
[10] 0.14213562
[1] 0.18921183 0.60342539 0.01763896
[4] 0.43185252 0.84606608
These
numbers are fractional parts of
2, 2 2, 3 2, . . . , see sub-section 2.2.1 for
details. The optional argument useTime can be used to
the machine time or not to initiate the seed. If we
do not use the machine time, two calls of torus
> torus(5, use =TRUE) produces obviously the same output.
4 DESCRIPTION OF THE RANDOM GENERATION FUNCTIONS 17
0.5
ACF
[1] 0.6457513 0.2915026 0.9372539
[4] 0.5830052 0.2287566
0.5
The dim argument is exactly the same as
congruRand or SFMT. By default, we use the 0 10 20 30 40 50
first prime numbers, e.g. 2, 3 and 5 for a call like
torus(10, 3). But the user can specify a set Lag
of prime numbers, e.g. torus(10, 3, c(7,
11, 13)). The dimension argument is limited
to 100 0001 . Series torus(10^5, mix = TRUE)
0.8
deal with serial dependence is to mix the Torus
algorithm with a pseudo random generator. The
ACF
0.4
torus function offers this operation thanks to
argument mixed (the Torus algorithm is mixed
0.0
with SFMT).
0 10 20 30 40 50
> torus(5, mixed =TRUE)
Lag
[1] 0.7495332 0.9489193 0.4007344
[4] 0.8258934 0.8760030
Figure 2: Auto-correlograms
In order to see the difference between, we can plot First we compare SFMT algorithm with Torus
the empirical autocorrelation function (acf in R), algorithm on figure 3.
see figure 2.
1.0
0.8
0.8
0.6
0.6
v
v
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
u u
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
u u
4.4 Applications of QMC methods defined on the unit hypercube. We want compute
Z
2
4.4.1 d dimensional integration Icos (d) = cos(||x||)e||x|| dx
Rd
v
d/2 n u d
X X
(1 )2 (tij )
u
Now we will show how to use low-discrepancy cos t
n
sequences to compute a d-dimensional integral i=1 j=1
4 DESCRIPTION OF THE RANDOM GENERATION FUNCTIONS 19
where 1 denotes the quantile function of the a vanilla European call in the framework of a
standard normal distribution. geometric Brownian motion for the underlying
asset. Those options are already implemented in
We simply use the following code to com- the package fOptions of Rmetrics bundle1 .
pute the Icos (25) integral whose exact value is
1356914. The payoff of this classical option is
SFMT
Torus
Park Miller
zero
of boolean Di .
0.00
0.02
SFMT
Torus
Call1 . These kind of options belongs to the path- Park Miller
zero
[0, T ].
0.00
options are already implemented in the package 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
The payoff of a DOC option is Figure 6: Error function for Down Out Call
1. start from point st0 , However, these results do not prove the Torus
2. for simulation i = 1 . . . n and time index j = algorithm is always better than traditional Monte
1...d Carlo. The results are sensitive to the barrier level
simulate stj ,i , H, the strike price X (being in or out the money
update disactivation boolean Di has a strong impact), the asset volatility and
the time point number d.
1
DOC is disactived when the underlying asset hits the
3
barrier. i.e. for k 6= j, i, (i{ pj })i and (i{ pk })i are linearly
2
created by Wuertz et al. (2007a). independent over Q.
5 RANDOM GENERATION TESTS 21
equiprobably. For example with d = 3, we should the unit hypercube [0, 1]t . Tests based on multiple
have an equal number of vectors (ui , ui+1 , ui+2 )i sequences partition the unit hypercube into cells
such that and compare the number of points in each cell
with the expected number.
ui < ui+1 < ui+2 ,
ui < ui+2 < ui+1 ,
ui+1 < ui < ui+2 , 5.2.1 The serial test
ui+1 < ui+2 < ui ,
ui+2 < ui < ui+1 The most intuitive way to split the unit hypercube
and ui+1 < ui+2 < ui . [0, 1]t into k = dt subcubes. It is achieved by
splitting each dimension into d > 1 pieces. The
volume (i.e. a probability) of each cell is just k1 .
For some d, we have d! possible orderings of
coordinates, which have the same probability to
1 The associated chi-square statistic is defined as
appear d! . The chi-squared statistic for the order
test for a sequence (ui )1in is just m
X (Nj )2
S= ,
d! 1 2
X (nj m d! ) j=1
S= 1 ,
j=1
m d! where Nj denotes the counts and = n
their
k
expectation.
where nj s are the counts for different orders and
m = nd . Computing d! possible orderings has an
exponential cost, so in practive d is small.
5.2.2 The collision test
5.1.4 The frequency test The philosophy is still the same: we want to
detect some pathological behavior on the unit
hypercube [0, 1]t . A collision is defined as when
The frequency test works on a serie of ordered
a point vi = (ui , . . . , ui+t1 ) falls in a cell where
contiguous integers (J = [i1 , . . . , il ] Z). If we
there are already points vj s. Let us note C the
denote by (ni )1in the sample number of the set
number of collisions
I, the expected number of integers equals to j J
is
1 The distribution of collision number C is given
n, by
il i1 + 1
nc1
Y ki 1
which is independent of j. From this, we can P (C = c) = 2S
nc
,
compute a chi-squared statistic k kc n
i=0
l
X (Card(ni = ij ) m)2 where 2 Snk denotes the Stirling number of the
S= , second kind1 and c = 0, . . . , n 1.
m
j=1
But we cannot use this formula for large n since
where m = nd .
the Stirling number need O(n log(n)) time to be
computed. As LEcuyer et al. (2002) we use a
Gaussian approximation if = nk > 32 1
and n
5.2 Tests based on multiple sequences 8
2 , a Poisson approximation if < 32 1
and the
exact formula otherwise.
Under the i.i.d. hypothesis, a vector of output 1
they are defined by 2 Snk = k 2 Sn1
k k1
+ 2 Sn1 and
1 n
values ui , . . . , ui+t1 is uniformly distributed over 2 Sn = 2 Sn = 1. For example go to wikipedia.
6 DESCRIPTION OF RNG TEST FUNCTIONS 23
k1
X 6.1 Test on one sequence of n numbers
f (Nj ).
j=0
Goodness of Fit tests are already imple-
mented in R with the function ks.test for
We retrieve the collision test with f (x) = (x1)+ Kolmogorov-Smirnov test and in package adk
2
and the serial test with f (x) = (x) . Plenty of for Anderson-Darling test. In the following, we
statistics can be derived, for example if we want will focus on one-sequence test implemented in
to test the number of cells with at least b points, randtoolbox.
f (x) = 11(x=b) . For other statistics, see LEcuyer
et al. (2002).
6.1.1 The gap test
5.2.4 The poker test The function gap.test implements the gap test
as described in sub-section 5.1.2. By default,
lower and upper bound are l = 0 and u = 0.5,
The poker test is a test where cells of the unit cube just as below.
[0, 1]t do not have the same volume. If we split
the unit cube into dt cells, then by regrouping cells
with left hand corner having the same number of > gap.test(runif(1000))
distinct coordinates we get the poker test. In a
more intuitive way, let us consider a hand of k Gap test
cards from k different cards. The probability to
have exactly c different cards is
chisq stat = 7.2, df = 10
1 k! c
, p-value = 0.7
P (C = c) = 2S ,
k (k c)! k
k
11 0 0.12
Histogram of SFMT(10^3)
10
, p-value = 0.016
8
Frequency
observed number 38 46 40 32 33 48
44 38 52 40 39 39 36 52 36 25 56 59
2
42 34 46 31 41 53
0
Defined in sub-section 5.2.1, the serial test focuses chisq stat = 6.7, df = 15
on the equidistribution of random numbers in the , p-value = 0.97
unit hypercube [0, 1]t . We split each dimension
of the unit cube in d equal pieces. Currently in
function serial.test, we implement t = 2 and exact distribution
d fixed by the user. (sample number : 1000/sample size : 128
/ cell number : 1024)
> serial.test(runif(3000), 3)
collision observed expected
number count count
Serial test
1 2 2.3
chisq stat = 7.4, df = 8
2 10 10
, p-value = 0.49
3 23 29
4 57 62
(sample size : 3000) 5 107 102
6 133 138
7 162 156
observed number 175 149 178 151 8 146 151
168 179 174 174 152 9 124 126
6 DESCRIPTION OF RNG TEST FUNCTIONS 26
> poker.test(SFMT(10000))
When the cell number is far greater than the
sample length, we use the Poisson approximation
(i.e. < 1/32). For example with congruRand Poker test
generator we have
Collision test
(sample size : 10000)
of the currently used generators for simulations C, call the routine torus,. . .
can be distinguished from truly random numbers
using the arithmetic mod 2 applied to individual
bits of the output numbers. This is true for Using R level functions in a package simply
Mersenne Twister, SFMT and also all WELL requires the following two import directives:
generators. The basis for tolerating this is based
on two facts. Imports: randtoolbox
In this section, we briefly present what to do if See file randtoolbox.h to find headers of
you want to use this package in your package. RNGs. Examples of C calls to other functions
This section is mainly taken from package expm can be found in this package with the WELL RNG
available on R-forge. functions.
Package authors can use facilities from rand- The definitive reference for these matters re-
toolbox in two ways: mains the Writing R Extensions manual, page
20 in sub-section specifying imports exports
call the R level functions (e.g. torus) in R and page 64 in sub-section registering native
code; routines.
Black, F. & Scholes, M. (1973), The pricing of Marsaglia, G. (1994), Some portable very-long-
options and corporate liabilities, Journal of period random number generators, Computers
Political Economy 81(3). 19 in Physics 8, 117121. 13
Matsumoto, M. & Nishimura, T. (1998),
Bratley, P. & Fox, B. (1988), Algorithm 659:
Mersenne twister: A 623-dimensionnally
Implementing sobols quasi-random sequence
equidistributed uniform pseudorandom number
generators, ACM Transactions on Mathemati-
generator, ACM Trans. on Modelling and
cal Software 14(88-100). 10
Computer Simulation 8(1), 330. 4, 6
Bratley, P., Fox, B. & Niederreiter, H. (1992), Matsumoto, M. & Saito, M. (2008), SIMD-
Implementation and tests of low discrepancy oriented Fast Mersenne Twister: a 128-
sequences, ACM Transactions Mode; Comput. bit pseudorandom number generator, Monte
Simul. 2(195-213). 10 Carlo and Quasi-Monte Carlo Methods 2006,
Springer. 6, 15
Eddelbuettel, D. (2007), random: True random
numbers using random.org. McCullough, B. D. (2008), Microsoft excels
URL: http://www.random.org 2 not the wichmannhill random number gen-
erators, Computational Statistics and Data
Genz, A. (1982), A lagrange extrapolation al- Analysis 52, 45874593. 11, 27
gorithm for sequences of approximations to
multiple integrals, SIAM Journal on scientific Namee, J. M. & Stenger, F. (1967), Construction
computing 3, 160172. 19 of ful ly symmetric numerical integration for-
mulas, Numerical Mathatematics 10, 327344.
Hong, H. & Hickernell, F. (2001), Implementing 19
scrambled digital sequences. preprint. 10
Niederreiter, H. (1978), Quasi-monte carlo meth-
ods and pseudo-random numbers, Bulletin of
Jackel, P. (2002), Monte Carlo methods in finace,
the American Mathematical Society 84(6). 2,
John Wiley & Sons. 7
7, 8, 9, 11
Joe, S. & Kuo, F. (1999), Remark on algorithm Niederreiter, H. (1992), Random Number Gener-
659: Implementing sobols quasi-random se- ation and Quasi-Monte Carlo Methods, SIAM,
quence generator. Preprint. 10 Philadelphia. 7
Knuth, D. E. (2002), The Art of Computer Pro- Panneton, F., LEcuyer, P. & Matsumoto, M.
gramming: seminumerical algorithms, Vol. 2, (2006), Improved long-period generators based
3rd edition edn, Massachusetts: Addison- on linear recurrences modulo 2, ACM Trans.
Wesley. 3, 4, 13, 23 on Mathematical Software 32(1), 116. 5
REFERENCES 30