CS1B Actuarial Statistics Solutions
CS1B Actuarial Statistics Solutions
CS1B Actuarial Statistics Solutions
INDICATIVE SOLUTION
Introduction
The indicative solution has been written by the Examiners with the aim of helping candidates. The
solutions given are only indicative. It is realized that there could be other points as valid answers and
examiner have given credit for any alternative approach or interpretation which they consider to be
reasonable.
IAI CS1B-1222
Solution 1:
i) > data<-read.csv("weights.csv") (1)
> summary(data) (1)
Gender Weights Day
Length:216 Min. : 5.00 Min. : 1.00
Class :character 1st Qu.: 7.00 1st Qu.: 9.00
Mode :character Median :10.00 Median :17.00
Mean :10.22 Mean :17.16
3rd Qu.:13.00 3rd Qu.:25.25
Max. :16.00 Max. :34.00
[2]
ii) > weight_M <- subset(data$Weights, data$Gender == "M" , select = c(data$Weights), drop = (1)
FALSE)
Alternate:
M_subset<-data[data$Gender=="M",]
weight_M <- M_subset$Weights
Marks given for other valid alternate solutions.
Alternate:
One Sample t-test
iv) > weight_F <- subset(data$Weights, data$Gender == "F" , select = c(data$Weights), drop = (1)
FALSE)
Page 2 of 13
IAI CS1B-1222
Alternate: (1)
F_subset<-data[data$Gender=="F",]
weight_F <- F_subset$Weights
t.test(weight_F,mu=7,alternative = "two.sided") (1)
(2)
One Sample t-test
data: weight_F
t = 3.5175, df = 107, p-value = 0.0006407
alternative hypothesis: true mean is not equal to 7
95 percent confidence interval:
7.307108 8.100300
sample estimates:
mean of x
7.703704
The required p value is 0.0006407. (1)
p- value <5%, there is no significance evidence to accept the null hypothesis
[Max 4]
v) > set.seed(2022)
> male<-c(rnorm(10,mum,sm)) (1)
> female<-c(rnorm(10,muf,sf)) (0.5)
> mean(male)
[1] 10.69983
> mean(female)
[1] 7.015142
> t.test(male,female,paired=TRUE,alternative="less",mu=5) (2)
Paired t-test
Page 3 of 13
IAI CS1B-1222
[16 Marks]
Solution 2:
i)
> abc <- c(-6,-10,3,18,-10,1,3,-13,-14,13)
> xyz <- c(8,0,-4,10,20,8,0,10,5,-19)
> pqr <- c(-20,19,3,13,20,7,-3,13,20,1)
> lmn <- c(14,4,5,15,19,9,6,10,7,16)
>
> abc_mean = mean(abc)
> xyz_mean = mean(xyz)
> pqr_mean = mean(pqr)
> lmn_mean = mean(lmn)
>
> print(abc_mean)
[1] -1.5
> print(pqr_mean)
[1] 7.3
> print(xyz_mean)
[1] 3.8
> print(lmn_mean)
[1] 10.5
>
> abc_sd = sd(abc)
> xyz_sd = sd(xyz)
> pqr_sd = sd(pqr)
> lmn_sd = sd(lmn)
>
> print(abc_sd)
[1] 11.00757
> print(pqr_sd)
[1] 12.62317
> print(xyz_sd)
[1] 10.46476
> print(lmn_sd)
[1] 5.190804 [Max 5]
ii)
> r_abc_xyz = cor(abc,xyz)
> r_abc_pqr = cor(abc,pqr)
> r_abc_lmn = cor(abc,lmn)
> r_xyz_pqr = cor(xyz,pqr)
> r_xyz_lmn = cor(xyz,lmn)
> r_pqr_lmn = cor(pqr,lmn)
>
> print(r_abc_xyz)
[1] -0.4456345
> print(r_abc_pqr)
[1] -0.2842739
> print(r_abc_lmn)
[1] 0.2634939
> print(r_xyz_pqr)
[1] 0.2923746
> print(r_xyz_lmn)
[1] 0.2577296
> print(r_pqr_lmn)
[1] -0.08393819
[3]
Page 4 of 13
IAI CS1B-1222
iii)
> cor.test(abc,xyz,method="pearson",alternative="two.sided",conf.level = 0.95)
(1)
Confidence interval for correlation coefficient between returns of ABC Oil and XYZ Airways is
(-0.84,0.26). Since it does not contain -1, we can say that there is no possibility of a perfect
negative correlation at 5% level of significance.
(1)
Confidence interval for correlation coefficient between returns of PQR Realty and LMN Bank
is (-0.68,0.58). Since it does not contain -1, we can say that there is no possibility of a perfect
negative correlation at 5% level of significance.
[4]
iv)
v) H0: Variance of Returns from Strategy A = Variance of Returns from Strategy B (1)
H1: Variance of Returns from Strategy A ≠ Variance of Returns from Strategy B
Page 5 of 13
IAI CS1B-1222
Since the p-value 0.1426 > 10%, we have sufficient evidence to accept the null hypothesis at
the 10% level of significance. Hence, the investor’s assumption of the two strategies being
equally risky seems reasonable.
(1)
[4]
vi)
> sd_A1 = (pABC^2 * abc_sd^2 + 2 * pABC * pXYZ * abc_sd * xyz_sd * -0.83 + pXYZ^2 * xyz_s
d^2)^(1/2)
> sd_A2 = (pABC^2 * abc_sd^2 + 2 * pABC * pXYZ * abc_sd * xyz_sd * 0.26 + pXYZ^2 * xyz_sd
^2)^(1/2)
>
> print(sd_A1)
[1] 3.140851
> print(sd_A2)
[1] 8.523165
> sd_B1 = (pPQR^2 * pqr_sd^2 + 2 * pPQR * pLMN * pqr_sd * lmn_sd * -0.68 + pLMN^2 * lm
n_sd^2)^(1/2)
> sd_B2 = (pPQR^2 * pqr_sd^2 + 2 * pPQR * pLMN * pqr_sd * lmn_sd * 0.58 + pLMN^2 * lmn
_sd^2)^(1/2)
>
> print(sd_B1)
[1] 8.637509
> print(sd_B2)
[1] 10.27457
The limits for standard deviation of Strategy A are (3.14, 8,52) and the limits for standard
deviation of Strategy B are (8.64, 10.27).
Based on Metric 2, since Strategy A has lower range of standard deviation, Strategy A would
be selected.
[Max 4]
vii) We develop limits for the risk-adjusted returns under Strategies A and B using the following
code:
> RAR_A1 = mean(strat_A)/sd_A1
> RAR_A2 = mean(strat_A)/sd_A2
> RAR_B1 = mean(strat_B)/sd_B1
> RAR_B2 = mean(strat_B)/sd_B2
>
> print(RAR_A1)
[1] 0.3661428
Page 6 of 13
IAI CS1B-1222
> print(RAR_A2)
[1] 0.1349264
> print(RAR_B1)
[1] 0.9377704
> print(RAR_B2)
[1] 0.788354 (1)
Based on the above the limits for Risk-Adjusted Return for Strategy A are (0.13,0.37) and limits
for Strategy B are (0.79,0.94). Since Strategy B gives higher risk-adjusted returns, we should go
for Strategy B using Metric 3.
Strategy B gives higher returns and has higher risk, but it eventually ends up giving higher risk (1)
adjusted returns (i.e. more returns per unit of risk) as compared to Strategy A.
[2]
[24 Marks]
Solution 3:
i) > #i.
> PA<-read.csv("PA_Data.csv") (1)
> model1<-glm(Claim~Gender*Health+Age,family=poisson(lin="log"),data=PA) (2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.17452 0.95762 0.182 0.8554
GenderM 0.06235 0.46853 0.133 0.8941
HealthNonDiabetic -1.31680 0.64556 -2.040 0.0414 *
Age 0.02248 0.02052 1.095 0.2734
GenderM:HealthNonDiabetic -0.10401 0.82008 -0.127 0.8991
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ii) >
> #linear predictor for Model 1 is
> # a+b1X1+b2X2+b3X3+b4X1X2 where (1)
Alt: 0.17452 + .06235X1 -1.31680X2 + .02248X3 -.10401X1X2
> # X1 = 0 for Female Gender and 1 for Male Gender (1)
> # X2 = 0 for Diabetic and 1 for Non Diabetic (1)
> # X3 is Age (0.5)
> # X1X2 indicates interaction term between Gender and Health Condition (0.5)
> [Max 3]
Page 7 of 13
IAI CS1B-1222
(0.5)
> #Scaled deviance = 13.179
> #AIC = - 2LogL(Model) + 2*Parameters
> #LogL(Model) = Parameters - AIC/2 (2)
>
> L<- 4- model1$aic/2
>L
[1] -27.07302
>
> #Log Likelihood of Model1 is -27.07302 [Max 4]
>
iv)
> #iv.
> model2<-glm(Claim~Health+Age,family=poisson(lin="log"),data=PA) (1)
>
> model2$aic < model1$aic
[1] TRUE
> # Model 2 AIC is lower than Model 1 showing Model2 outperforms Model1 (1.5)
[Max 2]
v) (1)
> #v.
> summary(model2)
Call:
glm(formula = Claim ~ Health + Age, family = poisson(lin = "log"),
data = PA)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.38527 -0.76449 0.06081 0.36914 1.40103
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.21958 0.87208 0.252 0.801208
HealthNonDiabetic -1.38710 0.39059 -3.551 0.000383 ***
Age 0.02252 0.02037 1.106 0.268841
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Give full marks in case reached to same conclusion using alternate methods.
Page 8 of 13
IAI CS1B-1222
(1.5)
> model3<-glm(Claim~Health,family=poisson(lin="log"),data=PA)
> summary(model3)
Call:
glm(formula = Claim ~ Health, family = poisson(lin = "log"),
data = PA)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.22474 -0.81754 -0.07119 0.27453 1.44149
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.1394 0.2000 5.697 1.22e-08 ***
HealthNonDiabetic -1.4271 0.3887 -3.671 0.000241 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Page 9 of 13
IAI CS1B-1222
b) > #vii. b.
> Student1_mean
1
3.125
> #For student1, expected claims are 3.125. Since, more claims are expected
> # for Student1, reduction of payment lead to more saving than (1)
> # extra payment for 1st claim and thus, less price
>
> Actuary2_mean
1
0.75
> #Whereas for Actuary2, expected claims are 0.75 close to 1. (1)
> #thus, more payment expected for Actuary2 resulting in higher price. [Max 2]
[33 Marks]
Solution 4:
i) a) We define two data sets w_draw and r_draw corresponding to the white balls and red balls
to be matched for cases A to I.
> r_draw <- c(1,0,1,0,1,0,1,1,1) # red balls to be matched corresponding to cases A to I (1)
[2]
b) Formula for the probability mass function of Hyper-Geometric Distribution in terms of the
arguments specified in the question is given below:
Where –
Page 10 of 13
IAI CS1B-1222
x = 0, 1, 2, 3, ………………
0<p<1
(p = m/(m+n)) [2]
e) We define another data set prize which defines the prize amounts for Cases A to I.
>
> prize <- c(20000000,1000000,50000,100,100,7,7,4,4)
> print(prize)
[1] 2e+07 1e+06 5e+04 1e+02 1e+02 7e+00 7e+00 4e+00 4e+00 [1]
f) We then define amt by multiplying prob_draw with prize to arrive at the expected amount one
can win from the lottery / jackpot.
g) Since, the expected prize amount is INR 0.39 and the prize of the lottery ticket is INR 0.50,
there is a profit of INR 0.11 implicit in the ticket price. [1]
Page 11 of 13
IAI CS1B-1222
ii)a) Formula for the probability mass function of Binomial Distribution in terms of the arguments
specified in the question is given below:
Where –
x = 0, 1, 2, 3, ………………
0<p<1
(p = m/(m+n)) [2]
b) The joint probabilities and the expected prize money pay-out is re-determined using binomial
distribution for determining matching probabilities:
iv) We use the formulae from the tables to determine the mean and variance for X using both
binomial and hyper-geometric distributions.
Page 12 of 13
IAI CS1B-1222
[1] 0.3360639
v) In reality, for lotteries, draws are done without replacement and hence hyper-geometric (1)
distribution would be more suitable in modelling matching probabilities.
However, as the size of the population (m+n) goes on increasing, binomial distribution (1)
provides a good approximation for hyper-geometric probabilities.
[2]
[27 Marks]
*************************
Page 13 of 13