Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Statistical_Computing

The document outlines a statistical computing coursework involving the derivation of a cumulative distribution function (CDF) from a given probability density function (PDF), and the application of the inverse probability integral transformation (PIT) theorem to generate random samples from the target distribution. It also discusses rejection sampling, including the steps to implement the algorithm and the efficiency measured by the acceptance rate. The results demonstrate that the generated samples closely match the expected distributions, with visual representations through histograms and density curves.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Statistical_Computing

The document outlines a statistical computing coursework involving the derivation of a cumulative distribution function (CDF) from a given probability density function (PDF), and the application of the inverse probability integral transformation (PIT) theorem to generate random samples from the target distribution. It also discusses rejection sampling, including the steps to implement the algorithm and the efficiency measured by the acceptance rate. The results demonstrate that the generated samples closely match the expected distributions, with visual representations through histograms and density curves.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

MATH48091 - Statistical Computing

Coursework 1 Submission
1(a). Since f (x) is continuous, the cumulative distribution function F (x) is
defined as Z x
F (x) = f (t) dt,
−∞

where
1
f (x) = exp(−|x|),
2
we have Z x Z x
1
F (x) = f (t) dt = exp(−|t|) dt.
−∞ −∞ 2
Then,
(R 0 Rx
1 1
exp(t) dt + exp(−t) dt, if x ≥ 0,
F (x) = R−∞
x
2
1
0 2

−∞ 2
exp(t) dt, if x < 0.

Thus, (
1 − 21 exp(−x), if x ≥ 0,
F (x) = 1
2 exp(x), if x < 0.

1(b). We know that if F is a continuous cumulative distribution function and U


is a random variable with U(0, 1) distribution . Let the random variable
X be defined by X = F −1 (U ). Then X has distribution with cumulative
distribution function F.
Since F (x) is continuous on (−∞, 0) and (0, ∞), we want to prove F (x)
is continuous everywhere. Then when x = 0 ,
1
1 − exp(−0) = 1/2
2
and
1
exp(0) = 1/2
2
. Thus
lim F (x) = lim+ F (x)
x→ 12 − x→ 12

So F (x) is continuous everywhere. And then we follow the procedure


below:
Step(1): We generate random samples u1 , . . . , un from the U(0, 1) distri-
bution.

1
Step(2): Compute Xi = F −1 (U ), where
(
1 − 1 exp(−x), if x ≥ 0,
U= 1 2
2 exp(x), if x < 0.

implies (
−log(2 − 2u), if u ≥ 1/2,
X=
log(2u), if u < 1/2.
Then using the inverse PIT theorem, the x1 , . . . , xn generated according
to the u1 , . . . , un by the above processes, is a random sample with size n
from f (x)
1(c). We generated n random uniform samples between 0 and 1,then applied
the inverse PIT theorem to generate samples from the target distribution.
The result is a set of n samples that follow the target distribution.

Following is the R code.

InversePIT<-function(n){
u<-runif(n,0,1)
x<-numeric(n)
for (i in 1:n){
if (u[i]>=1/2){x[i]<- -log(2-2*u[i])}
else {x[i]<- log(2*u[i])}
}
x
}

1(d). We ran the code to generate a random sample of size n=5000 and plot
a histogram of the data superimposed with the probability distribution
function f (x). We can find that the histogram and the pdf f (x) in this
plot fit well, since the shape of the histogram reflects the shape of the pdf
curve very well. This indicates that the generated data is a good fit for
the pdf f (x).

simdata<-InversePIT(5000)
f<-function(x) (1/2)*exp(-abs(x))
hist(simdata,freq=FALSE,ylim=c(0,0.5))
curve(f,add=TRUE,col="RED")

2
Histogram of simdata

0.5
0.4
0.3
Density

0.2
0.1
0.0

−5 0 5

simdata

Figure 1: Comparison between the PDF f (x) and the histogram of samples
obtained using the inverse PIT theorem

2(a). In this case, we have


1 x2
f (x) = √ · e− 2

and
1 −|x|
·e
g(x) = , for x ∈ (−∞, ∞).
2
Then, we can express the supremum as follows:
  r !
f (x) 2 − x2 +|x|
K = sup = sup ·e 2 .
g(x) π

Let r
2 − x2 +|x|
h(x) = ·e 2 .
π
Thus, we have q 2
 2 · e− x2 +x , if x ≥ 0,
π
h(x) = q
 2 · e− x22 −x , if x < 0.
π

Next, we find the first derivative of h(x) to analyze the monotonicity of


this function:
q 2
 2 (−x + 1) e− x2 +x , if x ≥ 0,
π
h′ (x) = q
− 2 (x + 1) e− x22 −x , if x < 0.
π

3
0.6
0.4
f

0.2
0.0

−5 0 5

Figure 2: Comparison
q between the function f (x) (blue) and the function K∗g(x)
(red), where K = 2e π

We can conclude that h(x) is increasing on the intervals (−∞, −1) and
(0, 1), while it is decreasing on the intervals (−1, 0) and (1, ∞). Therefore,
we identify two local maximum at x = −1 and x = 1.
Lastly, we evaluate h at these two points:
r r
2 −3 2 1
h(−1) = · e 2 < h(1) = · e2 .
π π

Hence   r
f (x) 2e
K = sup =
g(x) π
.

2(b). Showed in the Figure 2, f (x) ≤ K ∗ g(x) for all x.

f<-function(x) (1/sqrt(2*pi))*exp(-x^2/2)
g<-function(x) (1/2)*exp(-abs(x))
K<-sqrt((2*exp(1))/pi)
Kg<-function(x) K*g(x)
plot(f,col="BLUE",xlim=c(-8,8),ylim=c(0,0.65))
curve(Kg,add=TRUE,col="RED")

2(c). There are four steps in generating the algorithm of Rejection Sampling.
Step(1): Simulate x from g(x). According to what we have done in 1(b)
and 1(c), we can get random variables xi for i in (−∞, ∞) with probability
distribution function g(x).

4
q
2e
Step(2): Simulate yi from U(0, Kg(x)),where K = π .

Step(3): Accept xi if yi ≤ f (xi ), where f (x) is the pdf of the standard


Normal distribution.
Step(4): Continue the above processes until a sample of the required size
n is obtained.

2(d). We can present how efficient the rejection algorithm is by the ’acceptance
rate’, which is defined as the ratio of the number of accepted data points
to the total number of data points processed in my algorithm.
2(e). In this code: the returned ’res’ is a random sample from standard normal
distribution of size specified by the user. And the ’acceptance rate’ is an
empirical estimator of its efficiency.

simnorm<-function(n){
n_total<-0
n_accepted<-0
res<-numeric(n)
yz<-numeric(n)
while(n_accepted<n){
n_total<-n_total+1
x<-InversePIT(1)
y<-runif(1,0,Kg(x))
fx<-f(x)
if(y<fx){
n_accepted<-n_accepted+1
res[n_accepted]<-x
yz[n_accepted]<-y
}
}
return(list(res=res,yz=yz,n_accepted=n_accepted,
n_total=n_total,acceptance_rate=n_accepted/n_total)
) }
result<-simnorm(5000)
result

2(f). We run the function to obtain a sample of size n=5000. The figure indi-
cates that the data generated by the rejection algorithm is good at ap-
proximating a sample from standard Normal distribution, with a 76%
acceptance rate. A significant portion of the proposed samples were ac-
cepted,showing that the algorithm captured the characteristics of target
distribution without excessive rejection. So we can conclude that the data
is consistent with being sampled from a standard Normal distribution.

5
Histogram of result$res

0.4
0.3
Density

0.2
0.1
0.0

−4 −2 0 2

result$res

Figure 3: Histogram of the generated data with its density curve (red) compared
to the standard normal distribution density function (blue)

Below is the R code.

result<-simnorm(5000)
result
hist(result$res,freq=FALSE)
curve(f, add = TRUE, col = "BLUE", lwd = 2)
density_est <- density(result$res)
lines(density_est, col = "RED", lwd = 2)

You might also like