Assignment1
Assignment1
Assignment1
By
Section D
On
24-1-2020
1. Based on the sample customer data of 1999, what can Green conclude about
average customer profitability for Pilgrim Bank's entire customer population?
Answer:
We can say from the data below that the average profit is 111.5. The lowest value
for profit is -221 and the standard deviation is 272.8394. This means consumer
profitability varies widely. Also we can see from histogram that a big chunk is even
below 0 which is unprofitable and exactly 46.8 per cent of customers have negative
profit. Also from the correlation matrix, we can see that there is a stronger
correlation between age, tenure, income with profit.
…………………………………………………………………………………………
9Profit
Mean 111.502687
1.53401645
Standard Error 1
Median 9
Mode -2
272.839391
Standard Deviation 5
74441.3335
Sample Variance 5
Range 2292
Minimum -221
Maximum 2071
Sum 3527276
Count 31634
Confidence 3.00673204
Level(95.0%) 2
9Tenur
9Profit 9Online 9Age 9Inc e 9District
9Profit 1
9Online 0.00705 1
0.14554
9Age 1 -0.16573 1
0.14746 0.08115
9Inc 6 4 -0.07004 1
0.19113 0.42358 0.04539
9Tenure 3 -0.06648 3 5 1
0.00309 0.00346 0.02615 -
9District 5 9 -0.03083 1 0.01031 1
…………………………………………………………………………………………
Answer:
Here we chose the online and offline profits separately, and attempted to run
ANOVA to test whether or not the means and variances of both data are significantly
different.
After running the ANOVA, we tested the p-value and found that the p-value is
greater than 0.05 which means both the online and offline variances are statistically
identical within the error range of 5 percent. And hence we can say that there is no
significant difference in the profitability between online and offline variation.
ANOVA:
Single Factor
SUMMARY
Varianc
Groups Count Sum Average e
307764 110.786 73604.2
Column 1 27780 2 2 2
116.666 80465.6
Column 2 3854 449634 8 3
ANOVA
Source of
Variation SS df MS F P-value F crit
Between 117039. 117039. 1.57226 0.20988 3.84175
Groups 3 1 3 4 8 3
2.35E+0 74439.9
Within Groups 9 31632 9
2.35E+0
Total 9 31633
…………………………………………………………………………………………
Answer:
Here we have separated the data based on missing age and income data, as these
two demographic factors are missing values. For both cases, we took profits for
missing data as well as present data and ran ANOVA separately for both. After
running ANOVA we find that the p-value for both missing income and age is less than
.05, which means that the data variances are not identical and are significantly
different since both the hypothesis for variance being equal is rejected
…………………………………………………………………………………………
SUMMARY
Groups Count Sum Average Variance
125.1869 79197.9
With data 23345 3E+06 8 6
59039.7
Without data 8289 6E+05 72.96248 9
ANOVA
Source of P-
Variation SS df MS F value F crit
1668362 225.709 7.7E- 3.84175
Between Groups 6 1 16683626 8 51 3
73916.25
Within Groups 2.34E+09 31632 8
…………………………………………………………………………………………
SUMMARY
Varianc
Groups Count Sum Average e
With data 23373 293773 125.6890 79463
0 4
71.36496
Without data 8261 589546 8 58060
ANOVA
Source of P-
Variation SS df MS F value F crit
Between 1801265 9.22E- 3.84175
Groups 2 1 18012652 243.83 55 3
73874.24
Within Groups 2.34E+09 31632 3
…………………………………………………………………………………………
Answer:
Here R is used, using some dummy variables for age, salary, and district brackets.
The model's R^2 is 6.45% which means data variation is not clearly explained by the
model, and hence it is advised to collect more data variables and fill in missing data
points such as age and income. The variables Age(2,3,4,5,6,7), income(5,6,7,8,9),
district(1200) and tenure have a value of p which is less than 0.05, which indicates
they have a strong relationship with online earnings. On the other hand, the p-value
of certain variables such as income(1,2,3,4), district(1300) exceeds 0.05, which
indicates they have a low relationship to the benefit of the online user.
…………………………………………………………………………………………
Call:
lm(formula = X9Profit ~ as.factor(X9Age) + as.factor(X9Inc) +
X9Tenure + as.factor(X9District), data = f)
Residuals:
Min 1Q Median 3Q Max
-524.20 -155.40 -71.04 67.55 1956.64
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -53.9294 13.1744 -4.093 4.26e-05 ***
as.factor(X9Age)2 30.1445 11.9343 2.526 0.01155 *
as.factor(X9Age)3 68.8419 11.7074 5.880 4.16e-09 ***
as.factor(X9Age)4 73.2837 11.7885 6.217 5.17e-10 ***
as.factor(X9Age)5 77.4957 12.2367 6.333 2.45e-10 ***
as.factor(X9Age)6 97.5431 12.6813 7.692 1.51e-14 ***
as.factor(X9Age)7 133.0971 12.5012 10.647 < 2e-16 ***
as.factor(X9Inc)2 1.2393 11.6532 0.106 0.91530
as.factor(X9Inc)3 11.3702 8.3952 1.354 0.17563
as.factor(X9Inc)4 11.4107 8.5523 1.334 0.18214
as.factor(X9Inc)5 16.7295 8.5341 1.960 0.04997 *
as.factor(X9Inc)6 40.4157 7.4686 5.411 6.32e-08 ***
as.factor(X9Inc)7 61.7383 8.1552 7.570 3.86e-14 ***
as.factor(X9Inc)8 79.6597 9.3112 8.555 < 2e-16 ***
as.factor(X9Inc)9 148.3020 8.3545 17.751 < 2e-16 ***
X9Tenure 4.0763 0.2354 17.314 < 2e-16 ***
as.factor(X9District)1200 19.1753 6.3776 3.007 0.00264 **
as.factor(X9District)1300 7.1871 7.7592 0.926 0.35432
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
19.175 7.187
…………………………………………………………………………………………
5. Using the Pilgrim Bank data that we used during class today, what can Alan Green
recommend about the customers who use Online Bill Payment?
Answer:
Here we have selected online data for profits and bill payments for 1999 and 2000
and found that R^2 of the model is 1.2 percent. Hence data variation is not clearly
explained by the model, so it is recommended that more data be collected or
missing data points such as age and income be filled in. The variable bill payment has
a value of p which is less than 0.05, which means it has a direct connection to online
earnings.
…………………………………………………………………………………………
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.110613
R Square 0.012235
Adjusted R
Square 0.011979
Standard
Error 281.9606
Observations 3854
ANOVA
Significanc
df SS MS F eF
47.7135
Regression 1 3793308 3793308 2 5.75E-12
79501.7
Residual 3852 3.06E+08 5
Total 3853 3.1E+08
Coefficient Standard
s Error
t Stat P-value
21.3060
Intercept 104.1669 4.889081 2 2.58E-95
6.90749
9Billpay 91.24033 13.20888 7 5.75E-12
…………………………………………………………………………………………
…………………………………………………………………………………………
Regression Statistics
Multiple R 0.081591
R Square 0.006657
Adjusted R
Square 0.006322
Standard Error 343.9376
Observations 2965
ANOVA
Significanc
df SS MS F eF
19.8569
Regression 1 2348937 2348937 2 8.66E-06
118293.
Residual 2963 3.51E+08 1
Total 2964 3.53E+08
Coefficient Standard
s Error
t Stat P-value
22.5505
Intercept 155.2127 6.882884 3 4.7E-104
4.45611
0Billpay 77.19974 17.32447 1 8.66E-06
…………………………………………………………………………………………