Data Driven Decision Regression
Data Driven Decision Regression
006699414
The 95% of the data is 2 standard deviation away from the mean from
approximately 5.9 to 7.3. In this case X helps predicting Y.
First, I use Multivariate SVD Imputation in JMP to find the missing value in Usefulness. Then, I changed Major into 0 and
1 code and set Salary as Y and the rest of the numeric columns as X, and predicted salary by Data Analysis in Excel with
this function:
Major Gender Major Code Usefulness Gender cod e GPA Years Salary Pred. Salary
Business M a l e 0 3 0 3.53 4 . 4 0 52125 53940.1546
Business Female 0 1 1 2.58 4 . 1 8 52325 52884.8196
Business M a l e 0 4 0 3.52 5 . 3 0 63042 59552.5079
A & S M a l e 1 3 0 3 4 . 4 9 54928 49981.1702
Business M a l e 0 4 0 3.22 4 . 0 6 50599 52762.8727
A & S Female 1 2 1 3.06 3 . 8 7 42036 43934.1061
A & S Female 1 3 1 2.35 4.64 46427 50580.9715
A & S M a l e 1 3 0 3.22 5.08 51865 53009.9151
A & S Female 1 3.022442 1 2.86 2 . 0 3 33263 33310.2843
Business Female 0 5 1 3 . 6 5 . 7 6 58434 60228.2823
Business M a l e 0 4 0 3 5 . 3 8 61551 61399.4086
A & S M a l e 1 5 0 3.11 1 . 3 8 31235 30878.5899
Business M a l e 0 2 0 3.43 4.83 58730 56738.0787
A & S Female 1 4 1 3.31 2 . 9 1 35830 37596.9042
Business M a l e 0 2.6231026 0 2.62 4 . 8 7 53267 59153.9646
Business M a l e 0 5 0 3.28 5 . 5 4 65437 61734.7142
A & S Female 1 4 1 2.68 3 . 7 6 47591 44433.8952
A & S Female 1 4 1 3.16 3 . 2 7 42659 40187.4139
Business M a l e 0 3 0 3.84 4 . 3 3 50996 52702.8739
A & S M a l e 1 2 0 3.29 3 . 0 2 40185 40156.1614
A & S M a l e 1 5 0 3.69 2.32 33155 35104.5884
Business Female 0 1 1 2.54 3.38 52695 48103.323
SU M MA R Y O U T P U T
Salary vs. Years
Regression Statistics 80000
M u l t i p l e R 0.958187 60000
R S q u a r e 0.918122
40000
Adjusted R Square 0 . 8 9 2 5 3 5
Standard Erro r 3 2 5 7 . 7 4 7 20000
Observations 2 2
0
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00
A N O V A
d f S S M S F Significance F
Regression 5 1.9E+09 3.81E+08 35.88248 3.81E-08
R e s i d u a l 1 6 1.7E+08 10612918
T o t a l 2 1 2.07E+09
Coefficients Standard Error t S t a t P-value Lower 95 % Upper 95% Lower 95.0% Upper 95.0%
I n t e r c e p t 36019.53 7924.17 4.545528 0.000331 19221.04 52818.02 19 22 1. 0 4 52818.02
Major Code -5893.08 1830.547 -3.2193 0.005356 -9773.67 -2012.5 -9773.67 -2012.5
Usefulness 89.50296 670.9087 0.133406 0.895536 -1332.76 1511.766 -1332.76 1511.766
Gender code -2014.2 1661.857 -1.21202 0.243101 -5537.18 1508.781 -5537.18 1508.781
G P A -2612.12 2231.072 -1.17079 0.258825 -7341.78 2117.542 -7341.78 2117.542
Y e a r s 6107.477 757.6679 8.060889 5.03E-07 4501.293 7713.661 45 01 .2 9 3 7713.661
2. Using the hmeq.jmp file, develop the best model you can to predict loan amount. Evaluate each model and use α =
0.05.
3. Add a new Numeric Continuous Reason Code column and set HomeImp as 1 and DebCon as 0, and the rest of
them are missing values because I can’t analyze them in Nominal form.
4. Add 3 new nNumeric Continuous columns, JobCode1, JobCode2, and JobCode3, because there are 6 categories
under Job and I can’t analyze them in Nominal form.
5. Set Sales 001, Profexe 010, Other 100, Office 000, Mgr 101, Self 111, and the rest of them are missing values
6. Check all missing value in all column, I notice that if the type is Numeric Continuous, all the missing value will
be .. I suggest that 0 in these columns are not missing value.
8. Put Loan as Y and rest of the Numeric Continuous columns as X and find the regression data in Fit Model and
Show Prediction Expression and save the formula.
9. The Predict loan amount is at the last column with a default alpha 0.05.