Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Random Number Generation: Dr. John Mellor-Crummey

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Random Number Generation

Dr. John Mellor-Crummey


Department of Computer Science
Rice University
johnmc@cs.rice.edu

COMP 528 Lecture 21 5 April 2005

Topics for Today


Understand

Motivation
Desired properties of a good generator
Linear congruential generators
multiplicative and mixed

Tausworthe generators
Combined generators
Seed selection
Myths about random number generation
Whats used today: MATLAB, R, Linux

Why Random Number Generation?

Simulation must generate random values for variables in a


specified random distribution
examples: normal, exponential,

How? Two steps


random number generation: generate a sequence of uniform FP
random numbers in [0,1]
random variate generation: transform a uniform random
sequence to produce a sequence with the desired distribution

How Random Number Generators Work

Most commonly use recurrence relation

x n = f (x n"1, x n"2 ,...)


recurrence is a function of last 1 (or a few numbers), e.g.

!Example:

x n = (5x n"1 + 1) mod 16

For x0= 5, first 32 numbers are 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9,
14, 7, 4, 5, 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9, 14, 7, 4, 5
!xs are integers in [0,16]
dividing by 16, get random numbers in interval [0,1]

Properties of pseudo-random number sequences


from seed value, can determine entire sequence
they pass statistical tests for randomness
reproducibility (often desirable)
4

Random Number Sequences

Some generators do not repeat the initial part of a sequence

cycle length

tail
period

Desired Properties of a Good Generator

Efficiently computable
Period should be large
dont want random numbers in a simulation to recycle

Successive values should be


independent
uniformly distributed

Linear-Congruential Generators

1951: D.H. Lehmer found that residues of successive powers


of a number have good randomness

x n = a n mod m;

after computing x n"1, x n = ax n"1 mod m


multiplier

modulus

Lehmers generator: multiplicative LCG


Modern generalization: mixed LCG

x n = (ax n"1 + b) mod m

a,b,m > 0

Result: xn are integers in [0, m-1]


Popular because
analyzed easily
certain guarantees can be made about their properties
7

Properties of LCGs

Choice of a, b, m affects
period
autocorrelation

Observations about LCGs


period can never be more than m modulus m should be large
m = 2k yields efficient implementation by truncation
if b is non-zero, obtain period of m iff

m & b are relatively prime


every prime that is a factor of m is also a factor of a - 1
if m is a multiple of 4, a - 1 must be too
all of these conditions are met if
m = 2k, for some integer k
x n = (ax n"1 + b) mod
a = 4c + 1, for some integer c
b is an odd integer

Full-period generator = one with period m


not all are equally good
! adjacent elements = better
lower autocorrelation between

Example: Two Candidate LCGs


Which is better?

x n = ((2 34 + 1)x n"1 + 1) mod 2 35


x n = ((218 + 1)x n"1 + 1) mod 2 35

Both must be full period generators


m = 2k, for some integer k
a = 4c + 1, for some integer c
b is an odd integer

x n = (ax n"1 + b) mod m

Multiplicative LCGs

More efficient than mixed LCGs: no addition


Two classes: m = 2k, m 2k

10

Multiplicative LCG with m = 2k


n

x n = a mod 2

Most efficient LCG: mod = truncation


Not full-period: maximum possible period for m = 2k is 2k-2
only possible if multipler a = 8i3 and x0 is odd
!
consider

x n = 5x n"1 mod 2 5 (lcg_m2k_good)


x n = 7x n"1 mod 2 5 (lcg_m2k_bad)

If 2k-2 period suffices, may use multiplicative LCG for efficiency

11

Multiplicative LCG with m 2k


n

x n = a mod m, m " 2

Avoid small period of LCG when m = 2k: use prime modulus


Full period generator with proper choice of a
when a is primitive root of m

i.e. an mod m 1 for n = 1, 2, , m-2

Consider
x n = 3x n"1 mod 31 (lcg_mprime_good)

x n = 5x n"1 mod 31 (lcg_mprime_bad)


Note : 5 3 mod 31 = 125 mod 31 = 1

Observations
unlike mixed LCG, xn can never be 0 when m is prime

12

Examining Bits of a Multiplicative LCG


testgenerator(@r1,1,20)
n
decimal binary
--- ---------- ----------------1
25173 01100010 01010101
2
12345 00110000 00111001
3
54509 11010100 11101101
4
27825 01101100 10110001
!
5
55493 11011000 11000101
6
25449 01100011 01101001
7
13277 00110011 11011101
8
53857 11010010 01100001
9
64565 11111100 00110101
10
1945 00000111 10011001
11
6093 00010111 11001101
12
24849 01100001 00010001
13
48293 10111100 10100101
14
52425 11001100 11001001
15
61629 11110000 10111101
16
18625 01001000 11000001
17
2581 00001010 00010101
18
25337 01100010 11111001
19
11949 00101110 10101101
20
47473 10111001 01110001

x n = 25,173x n"1 mod 216

bit 1: always 1
bit 2: always 0
bit 3: cycle (10) of length 2
bit 4: cycle (0110) of length 4
In general:
kth bit follows cycle
of length 2k-2, k 2
Typical of multiplicative
LCG with modulus 2k
13

Examining Bits of a Mixed LCG


testgenerator(@r2,1,20)
n
decimal binary
--- ---------- ----------------1
39022 10011000 01101110
2
61087 11101110 10011111
3
20196 01001110 11100100
4
45005 10101111 11001101
!
5
3882 00001111 00101010
6
21259 01010011 00001011
7
65216 11111110 11000000
8
19417 01001011 11011001
9
30502 01110111 00100110
10
20919 01010001 10110111
11
26076 01100101 11011100
12
16421 01000000 00100101
13
44130 10101100 01100010
14
63139 11110110 10100011
15
32824 10000000 00111000
16
14513 00111000 10110001
17
51934 11001010 11011110
18
36303 10001101 11001111
19
35284 10001001 11010100
20
8573 00100001 01111101

x n = (25,173x n"1 + 13,849)mod 216

bit 1: cycle (10) of length 2


bit 2: cycle (1100) of length 4
bit 3: cycle (11110000) of length 8

In general:
kth bit follows cycle of length 2k
Typical of mixed LCG with
modulus 2k
14

LCG Cautions

Properties guaranteed only if


computations are exact: no roundoff
use integer arithmetic without overflow

Low-order bits not very random, high-order bits better


if one wants k bits && k < machine word length
better to choose high-order k bits than low-order k bits.

15

Tausworthe Generators

Significant interest in huge random numbers


cryptographic applications want many-bit random numbers
produce k-bit numbers by
produce random sequence of bits
chunk bit stream into k-bit quantities

1965: Tausworthe generator

bn = c q"1bn"1 # c q"2bn"2 # c q"3bn"3 # ... # c 0bn"q


c i and bi are binary variables
# is the xor operation (mod 2 addition)
uses last q bits of bit stream to compute next bit
autoregressive, order q: AR(q)

AR(q) generator maximum period = 2q - 1


16

Tausworthe Generator Notation

Characteristic polynomial notation


characteristic polynomial
x7 + x3 +1
bn +7 " bn +3 " bn = 0, n = 0,1,2,...
bn +7 = bn +3 " bn , n = 0,1,2,...

bn = bn#4 " bn#7 ,

n = 7,8,9,...

Most polynomials for Tausworthe generators are trinomials


Period depends on characteristic polynomial
if period = 2q - 1, characteristic polynomial is primitive polynomial

17

Implementing Tausworthe Generators

Linear feedback shift registers

x7 + x3 +1
bn +7 " bn +3 " bn = 0, n = 0,1,2,...
bn +7 = bn +3 " bn , n = 0,1,2,...
bn = bn#4 " bn#7 ,
bn

bn-1

bn-2

n = 7,8,9,...
bn-3

bn-4

bn-5

bn-6

bn-7

out

Disadvantage of Tausworthe generators


while sequence is good overall, local behavior may not be
known to perform negatively on runs up and down test

first-order serial correlation almost 0


suspected that some polynomials may give poor high-order corr.

18

Generating k-bit Random Numbers


k-bit random numbers xn from binary sequence bn
Generalized feedback shift register method (Lewis & Payne 73)

x n = 0. bn bn +sbn +2s ... bn +(k"1)s

s is carefully selected delay


s k: xn and xj have no bits in common for n j

!
s relatively
prime to 2q - 1: guarantees full period for xn
Advantage
xn can be generated very efficiently with wide-word shift and
exclusive or operations

Requires
storing an array of seed numbers
careful initialization of seed array
19

Extended Fibonacci Generators

Fibonacci sequence:
Fibonacci RNG:
Properties

x n = x n"1 + x n -2
x n = (x n"1 + x n -2 )mod m

not very good randomness

high serial correlation

Extended Fibonacci generator (Marsaglia 1983)

x n = (x!n"5 + x n -17 )mod2 k

state: ring buffer with 17 values


initialization

save integers in 17 values (not all integers even)


initialize j=16,k=4 cursors for buffer

generate

x = B[j] + B[k]
B[j] = x
j = j -1 mod 17; k = k -1 mod 17
return x

Properties
passes most statistical tests
period = 2k(217-1) (much longer than LCGs)

20

Some Combined Generators


Can combine 2 or more generators to produce a better one

Adding random numbers from 2 or more generators


if xn and yn are random sequences in [0,m-1], then

wn= (xn + yn) mod m

can be used as a random number

why do this?

can increase period and randomness if two generators have different periods

Exclusive-or random numbers from 2 or more generators


Santha & Vazirani (1984)

xor of 2 random n-bit streams generates a more random sequence

Shuffle
use sequence a to pick which recent element in sequence b to return
Marsaglia & Bray (1964)

keep 100 items of sequence b


use sequence a to select which to return next and replace

claim: better k distributivity than LFSR methods


problem: not easy to skip long sequence for multi-stream simulations
21

Seed Selection Issues

Wrong combination of seed and RNG can hurt


especially if RNG is flawed
e.g. seed might be RNG fixed point

Cases
one stream needed
if RNG has full period, then any seed as good as another

multiple streams needed


e.g. queue simulation requires
interarrival time stream
service time stream
requires special care!

22

Seed Selection Guidelines I

Dont use 0
multiplicative LCGs and Tausworthe generators would stick at 0

Avoid even values


seed should be odd for multiplicative LCG with m = 2k
for full period generators, all non-zero values equally good

Dont subdivide one stream


dont use a single stream for all random variables

might be a strong correlation between items in same stream

Use non-overlapping streams


each stream requires separate seed
dont use same seed for 2 or more streams!

if seeds are bad, streams will overlap and not be independent


right way: select seeds so streams dont overlap at all
example: need 3 streams of 20,000 numbers
pick u0 as seed for first stream
pick u20,000 as seed for second stream
pick u40,000 as seed for third stream

23

Seed Selection Guidelines II

Reuse seeds in successive replications


if simulation experiment is replicated several times
can use seeds from end of previous replication in next one

Dont use random seeds


simulation cant be reproduced
impossible to guarantee multiple streams wont overlap

24

Myths I

A complex set of operations leads to random results


complicated code random sequence of numbers that will pass
tests of uniformity and independence

A single test of goodness suffices


sequence 0, 1, , m-1
not random but passes chi-square test
will fail run test

use as many tests as possible

Pseudo-random numbers are unpredictable


e.g. can identify LCG parameters with a few numbers and predict
LCG unsuitable for cryptographic applications where
unpredictability is desired

Some seeds are better than others


e.g. odd vs. even, avoid particular seeds, etc.
x n = (9806x n"1 + 1)mod(217 "1)
37,911 is a fixed point!
may be true for some generators, but these should be avoided!
any non-zero seed should produce equally valid results
25

Myths II

Accurate implementation is not important


period and randomness are guaranteed only if formula is
implemented without overflow or truncation

overflows and truncations can


change the path of a generator
reduce the period

Bits of successive words are equally-randomly distributed


if an algorithm produces a k-bit wide number, randomness is
only guaranteed when all k bits are used
unless specified otherwise, assume any particular bit position
(or sequence thereof) will not be equally random

26

Whats Used Today: MATLAB

rand function
lagged Fibonacci generator
seed
cache of 32 floating point numbers
combined with a shift register random integer generator
core: j ^= (j<<13); j ^= (j>>17); j ^= (j<<5)

properties:
period: > 21492
fairly sure all FP numbers in [e/2,1-e/2] are generated
e = 2-52

27

Whats Used Today: R

Mersenne-Twister (Matsumoto and Nishimura,1998) [default]


twisted GFSR based on Mersenne primes
seed: 623-dimensional set of 32-bit integers + a cursor
period: 219937 - 1
equi-distribution in 623 consecutive dimensions (whole period)
[note: variant of MT for independent parallel streams exists too]

Knuth-TAOCP (Knuth, 1997)


GFSR using lagged Fibonacci sequences with subtraction

X[j] = (X[j-100] - X[j-37]) mod 230

seed: the set of the 100 last numbers + cyclic shift of buffer
period: about 2^129.

Knuth-TAOCP-2002
initialization of GFSR from seed was altered
28

Whats Used Today: R (continued)

Wichmann-Hill
seed: integer vector of length 3
seed[i] is in 1:(p[i] - 1)
p is the length 3 vector of primes, p = (30269, 30307, 30323)

cycle length: 6.9536e12 = prod(p-1)/4


reference: Applied Statistics (1984) 33, 123

Marsaglia-Multicarry multiply-with-carry RNG (Marsaglia)


seed: two integers, all values allowed
period: > 260
has passed all tests (according to Marsaglia)

Super-Duper (Marsaglia)
doesnt pass the MTUPLE test of the Diehard battery
period: about 4.6*10^18 for most initial seeds
seed: 2 integers (first: all values allowed; second: odd value).
default seeds are the Tausworthe and congruence long integers

29

Whats Used Today: Linux

random function
non-linear additive feedback-based generator
state: 8, 32, 64, 128, or 256 bytes
all bits considered random

rand function
bottom 12 bits go through cyclic pattern
higher-order bits more random

30

You might also like