Simulation: Programming in R For Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen
Simulation: Programming in R For Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen
Some applications:
I
Encryption
Bootstrap methods
Design of experiments
performance testing
Fast
Simple
Reproducible
Numbers must be uncorrelated, evenly distributed, and have a long period.
also used as default in ao. Python, Maple, MATLAB, Julia, STATA and
Microsoft Visual C++.
> set.seed(123)
> rnorm(3)
[1] -0.5604756 -0.2301775
1.5587083
> set.seed(456)
> rnorm(3)
[1] -1.3435214
0.6217756
0.8008747
> set.seed(123)
> rnorm(3)
[1] -0.5604756 -0.2301775
1.5587083
comparisons;
code control;
reproducibility.
if none af the above applies: DONT.
A few examples
Simulation from a uniform distribution
> runif(5,min=1,max=2)
[1] 1.528105 1.892419 1.551435 1.456615 1.956833
f (x)dx
3
x
Practical Implementation
>
>
>
>
x<-runif(1000000,0,2*pi)
y<-runif(1000000,0,8)
phat.under<-mean(y<exp(2*cos(x-pi)))
phat.under*16*pi
[1] 14.29143
100
0
50
Frequency
150
Histogram of p.sim
0.05
0.10
0.15
0.20
p.sim
0.25
0.30
0.35
> x<-rnorm(1000)
> plot(density(x),xlim=c(-8,16))
0.2
0.0
0.1
Density
0.3
0.4
density.default(x = x)
10
15
y<-rnorm(1000,mean=8)
lines(density(y),col="blue")
0.2
0.0
0.1
Density
0.3
0.4
density.default(x = x)
10
15
lines(density(rnorm(1000,sd=2)),col="red")
lines(density(rnorm(1000,mean=8,sd=2)),col="green")
0.2
0.0
0.1
Density
0.3
0.4
density.default(x = x)
10
15
lines(density(rnorm(1000,sd=4)),col="purple")
lines(density(rnorm(1000,mean=8,sd=4)),col="cyan")
0.2
0.0
0.1
Density
0.3
0.4
density.default(x = x)
10
15
The numbers distribute themselves around 0 and 8, with even more increased
variation.
Site 1
Site 8
Site 2
Anthill
(site 99)
Site 7
Site 6
Site 3
Site 4
Site 5
How many time units does it take to collect the food in a day?
9 objects to keep track of. We assign initial values, and write them to a text
file for later use:
> my.text<-"time.ants<-0
+
at.anthill<-rep(TRUE,3)
+
done.all<-0
+
done<-rep(0,3)
+
food<-c(rpois(8,lambda=10),0,0)
+
site<-c(10,10,10)
+
carry<-c(0,0,0)
+
visited.sites<-list(numeric(0),numeric(0),numeric(0))
+
total.visited.sites<-rep(0,3)"
> writeLines(my.text,con="Data/initialize.txt")
All 9 variables are updated. The loop stops when all ants are done; ie. when
done.all=1.
We simulate the the system by sourcing the initialization and and loop code:
>
>
+
+
+
+
total.time<-numeric(100)
for(k in 1:100){
source("Data/initialize.txt")
source("Data/loop.txt")
total.time[k]<-time.ants
}
20
0
Frequency
40
Histogram of total.time
100
150
total.time
200
The mathematical exposition is modest, and while examples are from the
medical world, the methods apply universally.
You can also check out your own field of expertise, for a statistics textbook
with relevant examples.