Statistical_Computing
Statistical_Computing
Coursework 1 Submission
1(a). Since f (x) is continuous, the cumulative distribution function F (x) is
defined as Z x
F (x) = f (t) dt,
−∞
where
1
f (x) = exp(−|x|),
2
we have Z x Z x
1
F (x) = f (t) dt = exp(−|t|) dt.
−∞ −∞ 2
Then,
(R 0 Rx
1 1
exp(t) dt + exp(−t) dt, if x ≥ 0,
F (x) = R−∞
x
2
1
0 2
−∞ 2
exp(t) dt, if x < 0.
Thus, (
1 − 21 exp(−x), if x ≥ 0,
F (x) = 1
2 exp(x), if x < 0.
1
Step(2): Compute Xi = F −1 (U ), where
(
1 − 1 exp(−x), if x ≥ 0,
U= 1 2
2 exp(x), if x < 0.
implies (
−log(2 − 2u), if u ≥ 1/2,
X=
log(2u), if u < 1/2.
Then using the inverse PIT theorem, the x1 , . . . , xn generated according
to the u1 , . . . , un by the above processes, is a random sample with size n
from f (x)
1(c). We generated n random uniform samples between 0 and 1,then applied
the inverse PIT theorem to generate samples from the target distribution.
The result is a set of n samples that follow the target distribution.
InversePIT<-function(n){
u<-runif(n,0,1)
x<-numeric(n)
for (i in 1:n){
if (u[i]>=1/2){x[i]<- -log(2-2*u[i])}
else {x[i]<- log(2*u[i])}
}
x
}
1(d). We ran the code to generate a random sample of size n=5000 and plot
a histogram of the data superimposed with the probability distribution
function f (x). We can find that the histogram and the pdf f (x) in this
plot fit well, since the shape of the histogram reflects the shape of the pdf
curve very well. This indicates that the generated data is a good fit for
the pdf f (x).
simdata<-InversePIT(5000)
f<-function(x) (1/2)*exp(-abs(x))
hist(simdata,freq=FALSE,ylim=c(0,0.5))
curve(f,add=TRUE,col="RED")
2
Histogram of simdata
0.5
0.4
0.3
Density
0.2
0.1
0.0
−5 0 5
simdata
Figure 1: Comparison between the PDF f (x) and the histogram of samples
obtained using the inverse PIT theorem
Let r
2 − x2 +|x|
h(x) = ·e 2 .
π
Thus, we have q 2
2 · e− x2 +x , if x ≥ 0,
π
h(x) = q
2 · e− x22 −x , if x < 0.
π
3
0.6
0.4
f
0.2
0.0
−5 0 5
Figure 2: Comparison
q between the function f (x) (blue) and the function K∗g(x)
(red), where K = 2e π
We can conclude that h(x) is increasing on the intervals (−∞, −1) and
(0, 1), while it is decreasing on the intervals (−1, 0) and (1, ∞). Therefore,
we identify two local maximum at x = −1 and x = 1.
Lastly, we evaluate h at these two points:
r r
2 −3 2 1
h(−1) = · e 2 < h(1) = · e2 .
π π
Hence r
f (x) 2e
K = sup =
g(x) π
.
f<-function(x) (1/sqrt(2*pi))*exp(-x^2/2)
g<-function(x) (1/2)*exp(-abs(x))
K<-sqrt((2*exp(1))/pi)
Kg<-function(x) K*g(x)
plot(f,col="BLUE",xlim=c(-8,8),ylim=c(0,0.65))
curve(Kg,add=TRUE,col="RED")
2(c). There are four steps in generating the algorithm of Rejection Sampling.
Step(1): Simulate x from g(x). According to what we have done in 1(b)
and 1(c), we can get random variables xi for i in (−∞, ∞) with probability
distribution function g(x).
4
q
2e
Step(2): Simulate yi from U(0, Kg(x)),where K = π .
2(d). We can present how efficient the rejection algorithm is by the ’acceptance
rate’, which is defined as the ratio of the number of accepted data points
to the total number of data points processed in my algorithm.
2(e). In this code: the returned ’res’ is a random sample from standard normal
distribution of size specified by the user. And the ’acceptance rate’ is an
empirical estimator of its efficiency.
simnorm<-function(n){
n_total<-0
n_accepted<-0
res<-numeric(n)
yz<-numeric(n)
while(n_accepted<n){
n_total<-n_total+1
x<-InversePIT(1)
y<-runif(1,0,Kg(x))
fx<-f(x)
if(y<fx){
n_accepted<-n_accepted+1
res[n_accepted]<-x
yz[n_accepted]<-y
}
}
return(list(res=res,yz=yz,n_accepted=n_accepted,
n_total=n_total,acceptance_rate=n_accepted/n_total)
) }
result<-simnorm(5000)
result
2(f). We run the function to obtain a sample of size n=5000. The figure indi-
cates that the data generated by the rejection algorithm is good at ap-
proximating a sample from standard Normal distribution, with a 76%
acceptance rate. A significant portion of the proposed samples were ac-
cepted,showing that the algorithm captured the characteristics of target
distribution without excessive rejection. So we can conclude that the data
is consistent with being sampled from a standard Normal distribution.
5
Histogram of result$res
0.4
0.3
Density
0.2
0.1
0.0
−4 −2 0 2
result$res
Figure 3: Histogram of the generated data with its density curve (red) compared
to the standard normal distribution density function (blue)
result<-simnorm(5000)
result
hist(result$res,freq=FALSE)
curve(f, add = TRUE, col = "BLUE", lwd = 2)
density_est <- density(result$res)
lines(density_est, col = "RED", lwd = 2)