Assignment 3.1
Assignment 3.1
This data set consists of a sample of over eight hundred used cars in this country. The retail price of
these cars was calculated from the tables provided by the association of car manufacturer. You are
provided with a data set containing the following variables:
Price: suggested retail price of the used car in excellent condition. The condition of a car can
greatly affect price. All cars in this data set were less than one year old when priced and
considered to be in excellent condition.
Mileage: number of miles the car has been driven
Make: manufacturer of the car.
Model: specific models for each car manufacturer.
Trim (of car): specific type of car model such as SE Sedan 4D, Quad Coupe 2D
Type: body type such as sedan, coupe, etc.
Cylinder: number of cylinders in the engine
Liter: a more specific measure of engine size
Doors: number of doors
Cruise: indicator variable representing whether the car has cruise control (1 = cruise)
Sound: indicator variable representing whether the car has upgraded speakers (1 = upgraded)
Leather: indicator variable representing whether the car has leather seats (1 = leather)
Use simple linear regression to explore the intuitive relationship between miles traveled
and retail price.
From the simple regression results, answer the following questions:
a. In general, what happens to price when there is one more mile on the car?
b. Does mileage help you predict price? What does the p-value tell you?
c. Does mileage help you predict price? What does the R-Sq value tell you?
Answers
Variables Entered/Removeda
Model
Variables
Variables
Entered
Removed
Mileageb
Method
. Enter
B
(Constant)
Std. Error
24764.559
904.363
-.173
.042
Mileage
Coefficients
Beta
-.143
Sig.
27.383
.000
-4.093
.000
a. The price will be reduced by 1.73 cents with each added mile on the car.
Coefficientsa
Standardized
Unstandardized Coefficients
Model
1
B
(Constant)
Std. Error
24764.559
904.363
-.173
.042
Mileage
Coefficients
Beta
-.143
Sig.
27.383
.000
-4.093
.000
b. Yes.p-value explains that the relationship between mileage and price is negatively
significant corelated.
It is significant but in negative direction.
Model Summary
Model
R Square
.143a
Adjusted R
Square
Estimate
.020
.019
9789.288
2.
Taking price as the dependent variable, perform stepwise multiple regression on this
data set.
What is your final model? How many variable/variables was/were dropped from the
model. Explain why?
Variables Entered/Removeda
Model
Variables
Variables
Entered
Removed
Method
Stepwise
(Criteria:
Probability-of-F-
Cylinder
Stepwise
(Criteria:
Probability-of-FCruise
Stepwise
(Criteria:
Probability-of-FLeather
Stepwise
(Criteria:
Probability-of-FMileage
Stepwise
(Criteria:
Probability-of-FDoors
Stepwise
(Criteria:
Probability-of-FSound
Coefficientsa
Standardized
Unstandardized Coefficients
Model
1
B
(Constant)
-17.057
Std. Error
1126.944
Coefficients
Beta
Sig.
-.015
.988
Cylinder
2
4054.203
206.852
19.600
.000
-1046.431
1082.655
-.967
.334
Cylinder
3392.587
211.273
.476
16.058
.000
Cruise
6000.366
678.841
.262
8.839
.000
-2978.398
1129.554
-2.637
.009
Cylinder
3276.233
209.189
.460
15.662
.000
Cruise
6362.343
671.901
.278
9.469
.000
Leather
3139.484
608.259
.142
5.161
.000
412.562
1296.815
.318
.750
Cylinder
3232.656
206.188
.454
15.678
.000
Cruise
6492.035
662.181
.284
9.804
.000
Leather
3161.569
599.032
.143
5.278
.000
Mileage
-.165
.032
-.137
-5.087
.000
(Constant)
5530.335
1709.446
3.235
.001
Cylinder
3257.643
203.798
.457
15.985
.000
Cruise
6319.636
655.373
.276
9.643
.000
Leather
2978.887
593.246
.135
5.021
.000
Mileage
-.167
.032
-.139
-5.214
.000
Doors
-1402.112
310.015
-.121
-4.523
.000
(Constant)
7323.164
1770.837
4.135
.000
Cylinder
3200.125
202.983
.449
15.765
.000
Cruise
6205.511
651.463
.271
9.525
.000
Leather
3327.143
597.114
.151
5.572
.000
Mileage
-.171
.032
-.141
-5.352
.000
Doors
-1463.399
308.274
-.126
-4.747
.000
Sound
-2024.401
570.718
-.096
-3.547
.000
(Constant)
(Constant)
(Constant)
.569
Model Summaryg
Model
1
2
R Square
Adjusted R
Square
Estimate
.324
.323
8133.162
.384
.382
7768.193
.569
.620
Durbin-Watson
.635c
.404
.402
7646.769
.423
.420
7530.569
.437
.433
7440.529
.446
.442
7387.114
.650
.661
.668
.304
Excluded Variablesa
Collinearity
Model
1
Beta In
.000
-.154
.999
.158
1.563
.118
.055
.082
-.140b
-4.890
.000
-.170
1.000
Cruise
.262
8.839
.000
.298
.874
Sound
-.074b
-2.543
.011
-.090
.992
3.981
.000
.139
.994
-.136
-4.966
.000
-.173
.998
.037c
.383
.702
.014
.081
-4.655
.000
-.162
.997
-2.094
.037
-.074
.988
Leather
.142
5.161
.000
.180
.983
Mileage
-.137d
-5.087
.000
-.177
.998
.037
.970
.001
.080
-4.377
.000
-.153
.993
-3.048
.002
-.107
.960
.017e
.180
.858
.006
.080
-4.523
.000
-.158
.992
-3.243
.001
-.114
.959
Liter
-.108
-1.108
.268
-.039
.074
Sound
-.096f
-3.547
.000
-.125
.956
-.908
.364
-.032
.074
Mileage
Liter
Doors
Sound
Liter
Doors
Sound
Liter
Doors
Sound
5
Tolerance
-4.401
Leather
Correlation
Doors
Sig.
Statistics
-.126b
Mileage
Liter
Partial
Liter
.115
-.128
-.058
.004
-.119
-.084
-.121
-.088
-.088
Litre is excluded where the P value is high. For each model ( 1 6 ) the p values are more
than 0.05 significant.
Only one variable is dropped. Because the P value of Litre is more than the significance
value; p < 0.05.
3.
Transform price to log price and take this new variable as your dependent variable.
Perform multiple regression by including variables in (2) as independent variables.
Discuss the results.
Variables Entered/Removeda
Model
1
Variables
Variables
Entered
Removed
Method
Leather,
Mileage, Doors,
. Enter
Cylinder, Sound,
Cruiseb
a. Dependent Variable: LgPrice
b. All requested variables entered.
Model Summaryb
Change Statistics
Std. Error
Mode
l
1
R
.695
Adjusted R
of the
R Square
Square
Square
Estimate
Change
Change
.484
.480
.12847
.484 124.410
df1
df2
6
797
Sig. F
Durbin-
Change
Watson
.000
After running the Log Price as the dependent variable, we can see that Litre is excluded
but cylinder is included.
.376
Coefficientsa
95.0%
Unstandardized
Standardized
Confidence
Coefficients
Coefficients
Interval for B
Std.
Model
1 (Constant)
Mileage
Correlations
Beta
.031
129.744 .000
3.935
.000
-.148
-5.786 .000
.000
.000 -.148
.057
.004
.440
16.018 .000
.050
.063
-.016
.005
-.077
-3.007 .003
-.027
.139
.011
.338
12.298 .000
.117
-.038
.010
-.099
-3.816 .000
-.057
.053
.010
.132
5.078 .000
.032
-.201
-.14
7
6
Cylinder
Doors
Cruise
Sound
Leather
.583
-.006 -.092
.162
.494
-.018 -.139
.073
.130
.493 .408
-.106
-.07
7
.399 .313
-.134
-.09
7
.177 .129
Excluded Variablesa
Collinearity
Model
1
Beta In
Correlation
Tolerance
-4.876
.000
-.170
1.000
Cylinder
2.170
.030
.076
.082
-.045b
-1.578
.115
-.056
.994
Cruise
.316
10.999
.000
.362
.857
Sound
-.101b
-3.561
.000
-.125
.996
2.777
.006
.098
.992
Mileage
-.147
-5.646
.000
-.196
.998
Cylinder
.243c
2.636
.009
.093
.082
-.039
-1.480
.139
-.052
.993
-.080
-3.017
.003
-.106
.990
Leather
.113
4.271
.000
.149
.980
Cylinder
.223d
2.463
.014
.087
.082
-.042
-1.610
.108
-.057
.993
-.084
-3.220
.001
-.113
.990
Leather
.114
4.393
.000
.154
.980
Cylinder
.236e
2.632
.009
.093
.082
Doors
Sound
Doors
Sound
Statistics
-.137b
Leather
Sig.
Partial
Mileage
Doors
.215
.079
VIF
4.056
3.206E-
Statistics
Error
3.996
Collinearity
.997
1.003
.857
1.167
.989
1.011
.859
1.165
.956
1.046
.952
1.050
-.036e
-1.375
.169
-.049
.990
-4.060
.000
-.142
.963
.204f
2.284
.023
.081
.081
Doors
-.042
-1.642
.101
-.058
.986
Doors
-2.346
.019
-.083
.916
Doors
Sound
5
Cylinder
-.106
-.062
Doors is excluded. Where the P values are more than 0.05 for each model.
Only one variable is dropped. Because the P value of Doors is more than the significance
value; p < 0.05.
4.
Since Type (Sedan, Hatchback, Convertible or Coupe) and Make (A,B,C,D,E or F) are
also criterias considered by many car buyers, perform another regression by considering
these two variables. Discuss the results.
After Type and Make are changed into dummy variables. The data is analysed.
Descriptive Statistics
Mean
Std. Deviation
LgPrice
4.2904
.17811
804
Mileage
19831.93
8196.320
804
Cylinder
5.27
1.388
804
Doors
3.53
.850
804
Cruise
.75
.432
804
Sound
.68
.467
804
Leather
.72
.447
804
.10
.300
804
.10
.300
804
.40
.490
804
.14
.349
804
.07
.263
804
Sedan
.42
.494
804
Convertible
.06
.242
804
Hatchback
.05
.218
804
Coupe
.17
.379
804
Variables Entered/Removeda
Model
1
Variables
Variables
Entered
Removed
Method
Coupe, Mileage,
Cruise, Leather,
Convertible,
Sound,
Hatchback, E, A,
B, D, C,
Cylinder, Sedanb
. Enter
Model Summaryb
Change Statistics
Std. Error
Mod
el
R
a
.960
Adjusted R
of the
R Square
Square
Square
Estimate
Change
Change
.922
.921
.05008
df1
.922 669.112
df2
14
Sig. F
Durbin-
Change
Watson
789
.000
.274
a. Predictors: (Constant), Coupe, Mileage, Cruise, Leather, Convertible, Sound, Hatchback, E, A, B, D, C, Cylinder,
Sedan
b. Dependent Variable: LgPrice
Coefficientsa
95.0%
Unstandardized Standardized
Coefficients
Confidence
Coefficients
Interval for B
Std.
Model
Error
1 (Constant)
3.903
.013
Cylinder
.072
.002
Cruise
.010
Sound
Collinearity
Correlations
Statistics
VIF
306.064 .000
3.877
3.928
.560
36.209 .000
.068
.076
.583
.790 .359
.412
2.429
.005
.024
1.963 .050
.000
.020
.494
.070 .019
.663
1.507
.002
.004
.004
.425 .671
-.006
.010 -.139
.015 .004
.884
1.131
Leather
.017
.004
.042
3.927 .000
.008
.025
.130
.138 .039
.845
1.183
.067
.009
.113
7.316 .000
.049
.085
.044
.252 .073
.414
2.416
.213
.010
.359
21.339 .000
.194
.233
.580
.605 .212
.348
2.870
-.007
.006
-.019
-1.167 .243
-.018
.377
2.654
.310
.008
.608
38.839 .000
.294
.008
.009
.012
.896 .371
-.010
-.032
.006
-.089
-5.275 .000
-.044
-.020
.029
.113
.009
.154
13.072 .000
.096
.130
.440
-.083
.010
-.102
-8.737 .000
-.102
-.065 -.263
.002
.006
.005
.364 .716
-.010
.014 -.178
Sedan
Convertible
Hatchback
Coupe
.005 -.467
.326
-.042
-.01
2
.402
.810 .385
.402
2.486
.026 -.237
.032 .009
.561
1.781
.349
2.862
.712
1.404
.727
1.375
.613
1.630
-.185
-.05
2
.422 .130
-.297
-.08
7
.013 .004
Coefficient Correlationsa
Cou
Model
pe
1 Correla Coupe
1.00
tions
0
Mileag
e
Cruise
Leathe
r
Conve
rtible
Sound
.021
-.07
ack
E
Cylind
er
Sedan
Covari
ances
Coupe
Hatch
age
back
se
her
-.07
-.07
1.00
-.01
-.02
.021
-.01 1.00
-.07
-.02
0
.102
rtible
.102
1.00
0
Hatchb
.023 .005
-.14
2
-.01
5
-.30
-.04
-.21
-.06
-.13
-.20
-.01
-.04
-.14
-.04
-.32
-.10
.009
-.30
-.03
.128 .017
.073 .079
.022
6E-5
-.08
7
-.33
-.06
.095
-.02
0
-
.235
nd
-.06
6
.337
Cylin Seda
E
-.30
-.21
-.20
-.04
-.06
-.01
-.04
-.03
-.13
-.04
-.32
-.14
-.10
-.08
-.06
.007 .023
.038
.028 .005
.106 .073
.030
1.000
-.033
-.14
2
-.03
3
1.00
0
.167
1.000
-.204 .115
-.145
-.211
-.01
4
.374
1.210
-.06
2
-
E-5 1.57
11 4E-6 4E-6
6E-6
.017
.051
-.33
4
-.01
4
.044 .065
-.09
8
-.14
-.25
-.25
-.05
-.33
1.00
0
1.00
0
.257 .498
1.00
0
-.020
.374
.076 -.062
.161
.398
-.20
1
-.43
5
-.622
-.608
-.20
-.43
-.42
-.62
-.60
-.20
-.42
- 4.44
.161 .098
.041
1.906
.513
-.07
-.250 .431
.398
.022
.128
-.24
-.070 .076
-.38
-.384 .065
-.30
-.26
-.09
.009
der
-.21
-.261 .044
-.241
-.20
.042 .115
.167 .042
.425 .025
.025 -.426
1.00
0
.253
- 1.51
.253
1.00
0
1.84
6E-5
Mileag
e
Cruise
2.76 4.69
6E-
7E-
11
14
2.18
4E-6
Leathe
r
1.81
4E-6
Conve
rtible
Sound
Hatchb
ack
1.21
0E-5
1.17
5E-5
1.23
3E-5
1.122
E-6
11
1.22
3E11
7.81
9E11
1.23 1.12
7.507
3E-6 2E-6
E-5
2.45
1.168
8E-6
E-6
1.10
5E-7
5.07
8E-6
6.33
2E-7
4E-
6.11
9E-6
3.74
1E-6
6E-5
7E11
1.586
E-5
E-7
1.16
8E-6
1.379
E-5
4E-
7E-
11
10
11
11
11
1.236
E-5
1.675 5.01
2.188
E-5 8E-7
E-5
E-5
1.29 3.60
2.659
2E-5 3E-6
E-5
-
1.78
1E-6
2.09
7E-6
2.464
E-5
4.278
E-6
2E-6 1E-6
6.38
4E-7
7.23
6.71
6E-
5E-
12
11
3.33
0E-6
5.01
8E-7
-
9E-6
5.16
1.23
E-6 9E-6
3.05 3.74
0E-5
9E-6
6E-
1.599 4.14
4.14
3E-
1.62
8E-6 6E-6
0E-6
8E-6
E-5
2.263
6.332
E-5 9E-6
2.36 6.29
2.45
9.075
- 6.38
1.379 1.59
0E-
11
11
-
11
E-11
E-6 2E-6
4.44 8.12
11
7E-
E-6 5E-7
6E-7
11
E-11
11
2E-5
9E-
5.078 3.28
6E-6
1.63
5E-
1.223
1.233 1.10
7E-
1.34 1.97
2.00
1.57
6E-5
- 2.00
1.90
1.78 2.09
1E-6 7E-6
-
2.31
3E-6
-
6.06
5E-7
3.03
2E-5
1.50
7E-6
2.29
9E-6 2E-5
1.96
2.30
1E-5
3.65
3.45
3E-6 2E-5
8.63
3.67
2E-6 4E-5
9.85
8E-6
1.226 2.31
1.51
E-5 3E-6
E-5
4E-5
Cylind
er
Sedan
1.51
1E-6
1.84
4E-5
7.23
1.197
12 0E-6 9E-7
E-6
6.71
5E11
1.23
9E-6
5.16
4E-7
1.962
E-5
6.06
3.039 1.75
5E-7
E-6 0E-6
1.50
7E-6
2.292
E-5
3.65 8.63
3E-6 2E-6
-
3.04
3.04
3.65
2E-6 9E-5
After including type and make (dummy variables), other variables are excluded from the
model as their partial correlation was significant. This suggest that if we maintain them in the
model, it will not have significant influence on the ability of the model to predict retail price
of the car.
The prediction model contained 11 variables in total and 9 dummies. All 11 predictors were
gathered in 11 steps with 5 variables removed. The model was statistically significant and
counted for 97.3% of the variance of the retail price. Only litre and mileage have the highest
influence on retail price of the car.
ZCZA6043
MULTIVARIATE ANALYSIS
ASSIGNMENT 3
MULTIPLE REGRESSION
PREPARED FOR :
PROF. MADYA DR. RASIDAH MOHAMAD SAID
PREPARED BY:
AL AZMI BIN ABDUL RAHMAN
(ZP 02311)