Credit
Credit
Credit
Done 2
Done 2
Done 2
Done 1.5
Done 5
Done 4
Done 1.5
Done 2
Done 2
Done 4
DOne 5
Done 2
Done 2
Done 5
14. Based on the dataset, explore the relationship between credit card balance (Y) and (a) Income (b) A
variables? Estimate a multiple linear regression model and report the statistical significan
Answer : If our p-value is less than the significance level, this means our independent variable is statistic
predictor variables, we notice only Income and Rating have P Values less than 0.05, meaning Age
statistically significant for our regression model.
Since the p-value = 4.59E-177 < .05 = α, we conclude that the regression model is a significantly good fit
jointly not statistically insignificant.
An adjusted R Square of 0.74 means our regression model can explain around 74% of the variation of th
average value of the observations (the mean of our sample)
endent variable is statistically significant for the model. looking at all the
s than 0.05, meaning Age, Education and Limit do not seem to be
our regression model.
l is a significantly good fit and this means our regression parameters are
y insignificant.
d 74% of the variation of the dependent variable Y (Balance) around the
(the mean of our sample).
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.93670257816683
R Square 0.877411719944386
Adjusted R Square 0.875856031111193
Standard Error 161.991764698626
Observations 400
ANOVA
df SS MS
Regression 5 74000827.16891 14800165
Residual 394 10339084.74109 26241.33
Total 399 84339911.91
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.863679799521448
R Square 0.745942796101409
Adjusted R Square 0.74466291094323
Standard Error 232.320254708623
Observations 400
ANOVA
df SS MS
Regression 2 62912749.71309 31456375
Residual 397 21427162.19691 53972.7
Total 399 84339911.91
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.935477390266294
R Square 0.875117947699436
Adjusted R Square 0.874488818972481
Standard Error 162.881339341611
Observations 400
ANOVA
df SS MS
Regression 2 73807370.61983 36903685
Residual 397 10532541.29017 26530.33
Total 399 84339911.91
RESIDUAL OUTPUT
F Significance F
1390.9998227788 4.5212176E-180
ndard Residuals
Education Line Fit Plot
2500
2000
1500 Balance
Balance
1000 Predict
500
0
10 20 30 40 50 60 70 80 90 100 110
Education
Balance
Balance
0 Predicte
0 00 00 00 00 00 00 00 00
20 40 60 80 10
0
12
0
14
0
16
0
Rating
400
200
0
-200 0 20 40 60 80 100 120 140 160 180 200
-400
Income
Balance
1000
500 Predicted Balance
0
-500 0 0 0 0
40 80 12 16 20
Income
500
B
0
-500 0 0 0 0
40 80 12 16 20
Income
n Line Fit Plot Limit Line Fit Plot
2500
2000
1500
Balance
Balance Balance
1000 Predicted Balance
Predicted Balance
500
0
4 6 8 10 12 14 16 18 20 22
70 80 90 100 110
Limit
n
Balance 1000
Predicted Balance
Predicted Balance 500
00 00 00 00 0
0 0 0 0
10 12 14 16 0 20 40 60 80 00 2 0 4 0 6 0 80 00
1 1 1 1 1 2
Income
400
200
0
20 140 160 180 200 -200 0 200 400 600 800 1000 1200
-400
Rating
Balance Balance
Predicted Balance 1000 Predicted Balance
500
0
-500 0 0 0 0 0 0 0
20 40 60 80 1 0 0 1 2 0
Rating
500
B
0
-500 0 0 0 0 0 0 0
20 40 60 80 1 0 0 1 2 0
Rating
Age Line Fit Plot
2500
2000
1500
Balance
Balance
1000
500 Predicted Balance
0
0 00 00 00 0 0 00 00
2 4 6 8 10 12
Age
13. Based on the equation derived in question 12, what is the estimated balance for a person with an inco
100k per year?
Answer :
X= Income, Y = Balance
Equation Derived in question 12 : Y = 6.0484x + 246.51
X = 100 (As unit of income unit is thousand US dollars per year)
Y = 6.0484*100+246.51
Y = 851.31
Balance = 851.31
851.31
or a person with an income of USD
6.51
s per year)
12. Run a simple linear regression equation with Income as X and Balance as Y. Report the
coefficient of Income significantly different from zero? What does this say about the effect of in
Answer : For every unit increase in X, there will be 6.048 unit increase in Y. which means for ev
Income.
Income Coefficient = 6.048
R² = 0.215
Income coefficient is around 10.44 standard deviation away from 0 which is very sig
As P value is less than 5% it shows a weak linear relationship between Income and
2000
ase in Y. which means for every unit increase in
1500
6.048 f(x) = 6.04836340853156 x + 246.514750591403
1000 R² = 0.214977310132406
away from 0 which is very significant.
ionship between Income and Sales. 500
0
0 20 40 60 80 100 120 140 160 180 200
MS F Significance F
18131167.3992 108.99171519 1.0308858E-22
166353.629424
Answer :
Coefficients
Limit : .171
Card : 26
Correlation coefficient
0.8
Balance when Credit Limit is increased but cards remain same : If we increase credit lmit by
cards as fixed, balance would be increased by 0.171 units
Effect on Balance when number of cards are increased but credit limit is not changed : If w
and hold limit as fixed, balance would be increased by 26.0 units and as P value is also less th
highly significant.
:
nts
71
26
oefficient
egression Statistics
0.8651882952
0.7485507862
0.7472840395
231.12475254
400
df SS MS F Significance F
2 63132707.3688 31566354 590.92382442 9.758453E-120
397 21207204.5412 53418.65
399 84339911.91
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0%
-369.03595539 36.1641465665 -10.20447 7.226921E-22 -440.13312796 -297.93878282 -440.133128
0.1714790369 0.00501313643 34.20594 2.00225E-120 0.1616234241 0.1813346497 0.1616234241
26.033754269 8.43836350937 3.085166 0.0021768189 9.44429084767 42.62321769 9.4442908477
Upper 95.0%
-297.93878282
0.1813346497
42.62321769
10. Consider your findings in questions 8-9. Discuss business mechanisms to increase or decre
credit cards. Try to quantify your answers.
Answer :
If we increase rating by 1, balance would be increased by 2.56, P value is also less than .05
correlation significant.
Correlation between Balance and rating is about .8 which is reasonably large and closer to 1 wh
relationship between each other.
So an extra increase in rating would contribute to around 2.5 unit increase in balance which is
Increasing Credit Rating would result in significant increase in Balance on cred
If we increase Limit by 1, balance would be increased by 0.171, P value is also less than .05
correlation significant.
Correlation between Balance and rating is about .8 which is reasonably large and closer to 1 wh
relationship between each other.
So an extra increase in limit would contribute to around .71 unit increase in balance which is sig
Credit limit would result in increase in Balance on credit Cards
hanisms to increase or decrease the balance on
ur answers.
Coefficients
Standard Error t Stat
Intercept -390.8463 29.06851 -13.44569
Rating 2.56624 0.075089 34.17594
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.861697
R Square 0.742522
Adjusted R Square 0.741875
Standard Error 233.585
Observations 400
ANOVA
df SS MS
Regression 1 62624255 62624255
Residual 398 21715657 54561.95
Total 399 84339912
Coefficients
Standard Error t Stat
Intercept -292.7905 26.68341 -10.97275
Limit 0.171637 0.005066 33.87867
F Significance F
1167.99458071 1.8988991E-120
F Significance F
1147.76421368 2.5305807E-119
Answer : For every unit increase in X, there will be 2.57 unit increase in Y. which means for every
in Credit Rating, we expect to accomplish around 2.57 units increase in Balance.
Rating Coefficient = 2.57
R² = 0.7458
500
0
0 200 400 600 800 1000 1200
MS F Significance F
62904789.8773 1167.9946 1.9E-120
53857.0905345
t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
-13.4456936239 3.073E-34 -447.9934 -333.69931859 -447.993364989 -333.6993
34.1759356962 1.9E-120 2.4186195 2.71386117137 2.41861948331 2.713861
341787237
Answer : For every unit increase in X, there will be .171 unit increase in Y. which means for
accomplish around .171 units increase in Balan
Limit Coefficient = .171
R² = 0.7425
Limit Balance
3606 333
6645 903 SUMMARY OUTPUT
7075 580
9504 964 Regression Statistics
4897 331 Multiple R
8047 1151 R Square
3388 203 Adjusted R Square
7114 872 Standard Error
3300 279 Observations
6819 1350
8117 1407 ANOVA
1311 0
5308 204 Regression
6922 1081 Residual
3291 148 Total
2525 0
3714 0
4378 368 Intercept
6384 891 Limit
6626 1048
2860 89
6378 968
2631 0
5179 411
1757 0
4323 671
3625 654
4534 467
13414 1809
5611 915
5666 863
2733 0
7838 526
1829 0
2646 0
2558 419
6457 762
6481 1093
3899 531
3461 344
3327 50
7659 1155
4763 385
6257 976
6375 1120
7569 997
5043 1241
4431 797
2252 0
4569 902
5183 654
3969 211
5441 607
5466 957
1499 0
1786 0
4742 379
4779 133
3480 333
5294 531
5198 631
3089 108
1671 0
2998 133
2937 0
4160 602
9704 1388
5099 889
5619 822
6819 1084
3954 357
7402 1103
4923 663
4523 601
5390 945
3180 29
3293 532
3254 145
6662 391
2101 0
3449 162
4263 99
4433 503
1433 0
2906 0
12066 1779
6340 815
2271 0
4307 579
7518 1176
5767 1023
6040 812
2832 0
5435 937
3075 0
855 0
5382 1380
3388 155
2963 375
8494 1311
3736 298
2433 431
7582 1587
9540 1050
4768 745
3182 210
1337 0
3189 0
6033 227
3261 297
3271 47
2959 0
6637 1046
6386 768
3326 271
4828 510
2117 0
9113 1341
2161 0
1410 0
1402 0
8157 454
7056 904
1300 0
2529 0
2531 0
5533 1404
3411 0
8376 1259
3461 255
3821 868
1568 0
5443 912
5829 1018
5835 835
3500 8
4116 75
3613 187
2073 0
10384 1597
6045 1425
6754 605
7416 669
4896 710
2748 68
4673 642
5110 805
1501 0
2420 0
886 0
5728 581
4831 534
2120 156
4612 0
3155 0
1362 0
4284 429
5521 1020
5550 653
3000 0
4865 836
1705 0
7530 1086
2330 0
5977 548
4527 570
2880 0
2327 0
2820 0
6179 1099
2021 0
4270 283
4697 108
4745 724
10673 1573
2168 0
2607 0
3965 384
4391 453
7499 1237
3584 423
5180 516
6420 789
4090 0
11589 1448
4442 450
3806 188
2179 0
7667 930
4411 126
5352 538
9560 1687
3933 336
10088 1426
2120 0
5384 802
7398 749
3977 69
2000 0
4159 571
5343 829
7333 1048
1448 0
6784 1411
5310 456
3878 638
2450 0
4391 1216
4327 230
9156 732
3206 95
5309 799
4351 308
5245 637
5289 681
4229 246
2762 52
5395 955
1647 195
5222 653
5765 1246
8760 1230
6207 1549
4613 573
7818 701
5673 1075
6906 1032
5614 482
4668 156
7555 1058
5137 661
4776 657
4788 689
2278 0
8244 1329
2923 191
4986 489
5149 443
2910 52
3557 163
3351 148
906 0
1233 16
6617 856
1787 0
2001 0
3211 199
2220 0
905 0
1551 98
2430 0
3202 132
8603 1355
5182 218
6386 1048
4221 118
1774 0
2493 0
2561 0
6196 1092
5184 345
9310 1050
4049 465
3536 133
5107 651
5013 549
4952 15
5833 942
1349 0
5565 772
3085 136
4866 436
3690 728
4706 1255
5869 967
8732 529
3476 209
5000 531
6982 250
3063 269
5319 541
1852 0
8100 1298
6396 890
2047 0
1626 0
1552 0
3098 0
5274 863
3907 485
3235 159
3665 309
5096 481
11200 1677
2532 0
1389 0
5140 293
4381 188
2672 0
5051 711
4632 580
3526 172
4964 295
4970 414
7506 905
1924 0
3762 70
3874 0
4640 681
7010 885
4891 1036
5429 844
5227 823
7685 843
9272 1140
3907 463
7306 1142
4712 136
1485 0
2586 0
1160 5
3096 81
3484 265
13913 1999
2863 415
5072 732
10230 1361
6662 984
3673 121
7576 846
4543 1054
3976 474
5228 380
3402 182
4756 594
3409 194
5884 926
855 0
5303 606
10278 1107
3807 320
3922 426
2955 204
3746 410
5199 633
1511 0
5380 907
10748 1192
1134 0
5145 503
1561 0
5140 302
7140 583
4716 425
3873 413
11966 1405
6090 962
2539 0
4336 347
4471 611
5891 712
4943 382
5101 710
6127 578
9824 1243
6442 790
7871 1264
3615 216
5759 345
8028 1208
6135 992
2150 0
3782 840
5354 1003
4840 588
5673 1000
7167 767
1567 0
4941 717
2860 0
7760 661
8029 849
5495 1352
3274 382
1870 0
5640 905
3683 371
1357 0
6827 1129
7100 806
10578 1393
6555 721
2308 0
1335 0
5758 734
4100 560
3838 480
4171 138
2525 0
5524 966
it limit. (Here credit limit is the X and the balance is the Y). Report the coefficients and the R-
squared. Show a scatter plot.
.171 unit increase in Y. which means for every unit increase in Credit Limit, we expect to
plish around .171 units increase in Balance.
Limit Coefficient = .171
R² = 0.7425
gression Statistics
0.861697267015396
0.742522179981802
0.741875250785776
233.58499824434
400
df SS MS F Significance F
1 62624255.2508864 62624255.2509 1147.76421368 2.5305807E-119
398 21715656.6591137 54561.9514048
399 84339911.9100001
1500
1000
500
0
0 2000 4000 6000 8000 10000 12000 14000 16000
Answer : Correlation Coefficient between Credit Rating and Credit Limits of 0.99 signifies that the
when one variable moves higher or lower, the other variable moves in the same direction wit
prediction that the priciple of "assign a higher credit limit to people with a high
Credit Limits of 0.99 signifies that these two variables have a perfect positive relationship;
able moves in the same direction with the same magnitude, which is enough to make
gher credit limit to people with a higher credit rating" is being followed.
0.996879737001682
Rating
Answer :
Null : There is no statistically significant relation between ethnicity and b
Alternative : There is statistically significant relation between ethnicity and
As P value is greater than 0.05, we can conclude there is no statistically significant relationship
Also as the F-value is Smaller than the F-critical value for the alpha level selected (0.05), we ha
hypothesis and can say that all the three samples have same means and thus do not belong
Answer :
significant relation between ethnicity and balance.
y significant relation between ethnicity and balance.
SUMMARY
Groups Count Sum Average Variance
African American 99 52569 531 235839.163265
Asian 99 49897 504.01010101 226080.112142
Caucasion 99 50635 511.46464646 192363.394146
ANOVA
Source of Variation SS df MS F P-value
Between Groups 38466.6127946 2 19233.306397 0.0881880598 0.91561288566
Within Groups 64119701.6162 294 218094.22318
Answer :
5a) Correlation between Balance and Age : Though the Correlation Coefficient of 0.001835
relationship between Age and balance, but its not a very strong
5b) Correlation Balance and Years of Education : Correlation coefficient of -0.00806 is nega
there is negative relation between Balance and Years of Education but it is very w
balance = marital
Answer :
e Correlation Coefficient of 0.001835 is positive but it is more closer to 0 than to 1. There is a positive
d balance, but its not a very strong relationship to make predictions.
lation coefficient of -0.00806 is negative but a very weak correlation at the same time. As per the data,
d Years of Education but it is very weak negative correlation to make any predictions.
Married Unmarried
Mean 517.942857142857 523.290323
Variance 205696.726229508 221735.039
Observations 245 155
Hypothesized Mean 0
df 319
t Stat -0.112233601335984
P(T<=t) one-tail 0.455354388554626
t Critical one-tail 1.64964431932712
P(T<=t) two-tail 0.910708777109253
t Critical two-tail 1.96742838690237
Unmarried
1800
1600
1400
1200
1000
800
600
400
200
0
0 500 1000 1500 2000 2500
Age
120 25
100
20
80
15
60
10
40
5
20
0 0
0 500 1000 1500 2000 2500 0 500
Years of Education
25
20
15
10
0
0 500 1000 1500 2000 2500
4. It is generally assumed that if there are more credit cards then the balance on the ca
true? Calculate a correlation coefficient and show a scatte
Answer : Though the Correlation Coefficient is positive but it is more closer to 0 than to 1
cards and balance, but its not a very strong relations
Covariance 54.3706375
Correlation Coefficient 0.086456347418619
Cards Balance
2 333
3 903
4 580 2500
3 964
2 331 2000
4 1151
2 203 1500
2 872
5 279
1000
3 1350
4 1407
500
3 0
1 204
1 1081 0
0 1 2 3 4
2 148
3 0
3 0
3 368
1 891
2 1048
4 89
1 968
3 0
5 411
3 0
5 671
6 654
2 467
2 1809
4 915
4 863
5 0
2 526
4 0
2 0
3 419
2 762
2 1093
4 531
4 344
3 50
2 1155
2 385
1 976
3 1120
3 997
2 1241
2 797
6 0
4 902
3 654
2 211
1 607
4 957
2 0
2 0
7 379
4 133
2 333
4 531
2 631
3 108
2 0
4 133
2 0
4 602
4 1388
4 889
2 822
4 1084
4 357
2 1103
1 663
4 601
4 945
2 29
1 532
1 145
3 391
2 0
3 162
1 99
3 503
3 0
4 0
4 1779
1 815
4 0
4 579
3 1176
4 1023
3 812
4 0
2 937
3 0
5 0
1 1380
4 155
2 375
5 1311
1 298
3 431
2 1587
6 1050
4 745
2 210
2 0
3 0
4 227
5 297
3 47
2 0
4 1046
4 768
4 271
5 510
3 0
1 1341
3 0
3 0
2 0
2 454
1 904
3 0
1 0
1 0
5 1404
3 0
2 1259
3 255
4 868
5 0
4 912
4 1018
3 835
3 8
2 75
4 187
2 0
3 1597
3 1425
2 605
3 669
3 710
3 68
2 642
3 805
3 0
2 0
5 0
3 581
3 534
4 156
3 0
2 0
4 0
5 429
2 1020
2 653
3 0
5 836
3 0
1 1086
5 0
4 548
4 570
2 0
1 0
1 0
4 1099
3 0
1 283
4 108
3 724
3 1573
3 0
4 0
2 384
2 453
5 1237
5 423
3 516
2 789
3 0
1 1448
1 450
2 188
2 0
2 930
2 126
4 538
3 1687
4 336
7 1426
2 0
2 802
1 749
2 69
4 0
3 571
2 829
6 1048
2 0
5 1411
2 456
8 638
2 0
5 1216
3 230
2 732
2 95
3 799
2 308
2 637
2 681
3 246
3 52
3 955
2 195
4 653
3 1246
5 1230
4 1549
5 573
4 701
4 1075
6 1032
3 482
2 156
3 1058
3 661
4 657
1 689
3 0
3 1329
3 191
2 489
5 443
6 52
1 163
5 148
2 0
3 16
1 856
4 0
5 0
4 199
3 0
1 0
3 98
2 0
5 132
3 1355
6 218
4 1048
3 118
2 0
1 0
2 0
6 1092
4 345
3 1050
1 465
2 133
1 651
3 549
4 15
3 942
4 0
4 772
1 136
1 436
2 728
6 1255
5 967
3 529
2 209
2 531
2 250
3 269
3 541
3 0
2 1298
3 890
2 0
2 0
2 0
4 0
3 863
2 485
5 159
4 309
2 481
7 1677
4 0
5 0
1 293
3 188
1 0
3 711
1 580
3 172
1 295
1 414
2 905
2 0
3 70
3 0
2 681
3 885
1 1036
3 844
6 823
2 843
2 1140
3 463
2 1142
2 136
3 0
5 0
3 5
2 81
6 265
4 1999
2 415
1 732
3 1361
3 984
3 121
2 846
2 1054
2 474
3 380
2 182
2 594
2 194
4 926
3 0
1 606
1 1107
4 320
2 426
5 204
2 410
7 633
2 0
5 907
2 1192
3 0
3 503
4 0
1 302
2 583
2 425
1 413
2 1405
3 962
1 0
1 347
3 611
4 712
2 382
3 710
1 578
3 1243
4 790
3 1264
2 216
3 345
3 1208
4 992
4 0
2 840
2 1003
3 588
5 1000
2 767
3 0
1 717
1 0
3 661
2 849
1 1352
9 382
3 0
3 905
4 371
3 0
2 1129
2 806
3 1393
2 721
2 0
2 0
4 734
3 560
5 480
5 138
1 0
5 966
e credit cards then the balance on the cards will be more. Based on this dataset, do you think this is
correlation coefficient and show a scatter plot to support your answer.
positive but it is more closer to 0 than to 1. There is a positive relationship between number of credit
balance, but its not a very strong relationship to make predictions
Balance
2500
2000
1500
1000
500
0
0 1 2 3 4 5 6 7 8 9 10
3. Is there a difference between students and non-students as far as average balance
Answer :
Null: There is no diference between students and non students
Alternative : There is diference between students and non studen
As the T stat value (4.90277) is greater than T Critial two tail (2.0128) ,We can conclu
women and reject the nu
Also as the P values are less than 0.05 , we have enough evidence at the significanc
students are different.
Students NonStudents
Mean 876.825 480.369444444
Variance 240101.94295 193085.136111
Observations 40 360
Hypothesized Mean Difference 0
df 46
t Stat 4.9027786614
P(T<=t) one-tail 6.086193E-06
t Critical one-tail 1.6786604136
P(T<=t) two-tail 1.217239E-05
t Critical two-tail 2.0128955989
s far as average balance is concerned? Use a two-sample t-test to draw your conclusion.
Answer :
udents and non students as fas as average balance is concerned
students and non students as fas as average balance is concerned
(2.0128) ,We can conclude that there is difference in avarage balance between men and
women and reject the null.
vidence at the significance level of 0.5 to say that population means of Students and Non
students are different.
2. Is there a difference between men and women as far as average balance is con
Answer :
Null: There is no diference between men and women as fa
Alternative : There is diference between men and women as
As the T stat value (-0.4283) is less than T Critial two tail (1.9659) ,We can conclude t
women and cannot reject the
Also as the P values (0.3343),(0.6686) is greater than 0.05 , we have enough evidenc
men and women are sam
Male
Mean 509.80310880829
Variance 213554.565198618
Observations 193
Hypothesized Mean 0
df 396
t Stat -0.42838442994029
P(T<=t) one-tail 0.334302082712107
t Critical one-tail 1.64871060117934
P(T<=t) two-tail 0.668604165424214
t Critical two-tail 1.96597260846954
women as far as average balance is concerned? Use a two-sample t-test to draw your conclusion.
Answer :
ference between men and women as fas as average balance is concerned
diference between men and women as fas as average balance is concerned
al two tail (1.9659) ,We can conclude that there is no difference in avarage balance between men and
women and cannot reject the null.
er than 0.05 , we have enough evidence at the significance level of 0.5 to say that population means of
men and women are same.
Female
529.536231884058
210187.104263402
207
1. A company manager says that the average balance on their credit cards is $500. D
Use a one-sample t-test to draw your conclusio
Answer :
Null : average balance on their credit cards is $5
Alternative : average balance on their credit cards not eq
As one tail P value of (0.1922) is more than significance value of 0.05, we can not reje
justified
Balance
Mean 520.015
Variance 211378.225338346
Observations 400
Hypothesized Mean 500
df 399
t Stat 0.870673780728271
P(T<=t) one-tail 0.192227913652033 Much more that .05
t Critical one-tail 1.64868153355544
P(T<=t) two-tail 0.384455827304066
t Critical two-tail 1.96592729592089
their credit cards is $500. Do you think that this assertion is justified?
t-test to draw your conclusion.
Answer :
nce on their credit cards is $500
e on their credit cards not equal to $500
value of 0.05, we can not reject the null and think that this statement is
justified