100% found this document useful (1 vote)

278 views

Data Driven Decision Regression

Here are the key steps I would take to develop the best predictive model for loan amount using the hmeq.jmp file: 1. Recode variables for consistency and to address missing data issues. 2. Develop a regression model with loan amount as the target (Y) and all other numeric variables as predictors (X). 3. Evaluate the model fit using R-square, RMSE, ANOVA table. Check significance of individual predictors. 4. Identify outliers/influential points and address them if needed. 5. Perform variable selection using stepwise regression to identify the most predictive subset of variables. 6. Evaluate the selected model fit and check assumptions. 7. Validate the model

Uploaded by

Jen Chang

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

278 views

Data Driven Decision Regression

Uploaded by

Jen Chang

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Yi-Ting Chang

006699414

Data Driven Decision Making

 This simple linear regression is a positive slope.

 100 samples were run to find sample linear fit
 Response = 3.17 + 3.09Predictor is the sample linear fit
 The slope of the sample data is 3.09.
 Response = Y, Predictor = X
 𝑦 = 𝑚𝑥 + 𝑏, m is the slope.
 𝑦̂ = 𝑏𝑂 + 𝑏1 𝑥, 𝑏1 is the slope.
 The method of least square is used to find the regression line.
 My R Square is 0.92, very large, Y variable can be explained by X.
 My RMSE, standard deviation of the residuals (prediction errors) is 2.08,
very large, so the sample data points are very spread out, not concentrated
around the linear fit.
 The F value and T value are very small, <.0001, and F ration and t Ration are
very large, so I reject 𝐻0 .

The distribution is very close to a normal distribution, a bell shape. There’s a

line at the center of this distribution. It’s the 𝑏0 of this linear regression
equation which is 3.19. As n gets larger, the distribution will be closer to
normal.

The 95% of the data is 2 standard deviation away from the mean from
approximately 5.9 to 7.3. In this case X helps predicting Y.

I calculate the distance from the mean as below:

𝑠 1.81155
𝑥̅ ± 𝑡𝑛−1,𝛼 = 6.6568 ± 𝑡100−1, .25 √100 = 6.6568 ± 1.984 * .181155 =
2 √𝑛
6.6568 ± .359411 (68% values lie between this range)

6.6568 ± .359411*2 = 5.9379~7.3756 (95% values lie between this range)

1. Using the Countif.xls file, develop a regression model to predict Salary by using all the remaining variables. Use α =
0.05. Evaluate this model—perform all the tests. Run a stepwise model and evaluate it.

First, I use Multivariate SVD Imputation in JMP to find the missing value in Usefulness. Then, I changed Major into 0 and
1 code and set Salary as Y and the rest of the numeric columns as X, and predicted salary by Data Analysis in Excel with
this function:

Y=36019.53-5893.08Major Code+89.50296Usefulness-2014.2Gender code-2612.12GPA+6107.477*Years

Major Gender Major Code Usefulness Gender cod e GPA Years Salary Pred. Salary
Business M a l e 0 3 0 3.53 4 . 4 0 52125 53940.1546
Business Female 0 1 1 2.58 4 . 1 8 52325 52884.8196
Business M a l e 0 4 0 3.52 5 . 3 0 63042 59552.5079
A & S M a l e 1 3 0 3 4 . 4 9 54928 49981.1702
Business M a l e 0 4 0 3.22 4 . 0 6 50599 52762.8727
A & S Female 1 2 1 3.06 3 . 8 7 42036 43934.1061
A & S Female 1 3 1 2.35 4.64 46427 50580.9715
A & S M a l e 1 3 0 3.22 5.08 51865 53009.9151
A & S Female 1 3.022442 1 2.86 2 . 0 3 33263 33310.2843
Business Female 0 5 1 3 . 6 5 . 7 6 58434 60228.2823
Business M a l e 0 4 0 3 5 . 3 8 61551 61399.4086
A & S M a l e 1 5 0 3.11 1 . 3 8 31235 30878.5899
Business M a l e 0 2 0 3.43 4.83 58730 56738.0787
A & S Female 1 4 1 3.31 2 . 9 1 35830 37596.9042
Business M a l e 0 2.6231026 0 2.62 4 . 8 7 53267 59153.9646
Business M a l e 0 5 0 3.28 5 . 5 4 65437 61734.7142
A & S Female 1 4 1 2.68 3 . 7 6 47591 44433.8952
A & S Female 1 4 1 3.16 3 . 2 7 42659 40187.4139
Business M a l e 0 3 0 3.84 4 . 3 3 50996 52702.8739
A & S M a l e 1 2 0 3.29 3 . 0 2 40185 40156.1614
A & S M a l e 1 5 0 3.69 2.32 33155 35104.5884
Business Female 0 1 1 2.54 3.38 52695 48103.323

SU M MA R Y O U T P U T
Salary vs. Years
Regression Statistics 80000
M u l t i p l e R 0.958187 60000
R S q u a r e 0.918122
40000
Adjusted R Square 0 . 8 9 2 5 3 5
Standard Erro r 3 2 5 7 . 7 4 7 20000
Observations 2 2
0
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00
A N O V A
d f S S M S F Significance F
Regression 5 1.9E+09 3.81E+08 35.88248 3.81E-08
R e s i d u a l 1 6 1.7E+08 10612918
T o t a l 2 1 2.07E+09

Coefficients Standard Error t S t a t P-value Lower 95 % Upper 95% Lower 95.0% Upper 95.0%
I n t e r c e p t 36019.53 7924.17 4.545528 0.000331 19221.04 52818.02 19 22 1. 0 4 52818.02
Major Code -5893.08 1830.547 -3.2193 0.005356 -9773.67 -2012.5 -9773.67 -2012.5
Usefulness 89.50296 670.9087 0.133406 0.895536 -1332.76 1511.766 -1332.76 1511.766
Gender code -2014.2 1661.857 -1.21202 0.243101 -5537.18 1508.781 -5537.18 1508.781
G P A -2612.12 2231.072 -1.17079 0.258825 -7341.78 2117.542 -7341.78 2117.542
Y e a r s 6107.477 757.6679 8.060889 5.03E-07 4501.293 7713.661 45 01 .2 9 3 7713.661

2. Using the hmeq.jmp file, develop the best model you can to predict loan amount. Evaluate each model and use α =
0.05.

1. Recode the headings, correct the miss spellings.

2. Change the type of Default into Numeric Continuous.

3. Add a new Numeric Continuous Reason Code column and set HomeImp as 1 and DebCon as 0, and the rest of
them are missing values because I can’t analyze them in Nominal form.

4. Add 3 new nNumeric Continuous columns, JobCode1, JobCode2, and JobCode3, because there are 6 categories
under Job and I can’t analyze them in Nominal form.

5. Set Sales 001, Profexe 010, Other 100, Office 000, Mgr 101, Self 111, and the rest of them are missing values

6. Check all missing value in all column, I notice that if the type is Numeric Continuous, all the missing value will
be ．. I suggest that 0 in these columns are not missing value.

7. Use Multivariate SVD Imputation to find all missing value.

8. Put Loan as Y and rest of the Numeric Continuous columns as X and find the regression data in Fit Model and
Show Prediction Expression and save the formula.

9. The Predict loan amount is at the last column with a default alpha 0.05.

Euler-Formula-Complete-Guide
No ratings yet
Euler-Formula-Complete-Guide
22 pages
Language Education Research
100% (1)
Language Education Research
15 pages
Berkeley Madonna V 9 Tutorial 1
No ratings yet
Berkeley Madonna V 9 Tutorial 1
21 pages
ECE 476 Power System Analysis
No ratings yet
ECE 476 Power System Analysis
36 pages
Differential Equations Practice
No ratings yet
Differential Equations Practice
26 pages
Matlab Practical Assignments
No ratings yet
Matlab Practical Assignments
2 pages
Lecture 11 Week 12
No ratings yet
Lecture 11 Week 12
38 pages
CHROMATOGRAPHY Protein Purification
No ratings yet
CHROMATOGRAPHY Protein Purification
12 pages
Dynamic Light Scattering
No ratings yet
Dynamic Light Scattering
23 pages
CDS NW Synthesis and Characterization.12
100% (1)
CDS NW Synthesis and Characterization.12
22 pages
1C-4 Schaum's Partial Fractions
No ratings yet
1C-4 Schaum's Partial Fractions
9 pages
Industry 4.0 in Manufacturing: ME F426 (I SEM, 2020-2021) by
No ratings yet
Industry 4.0 in Manufacturing: ME F426 (I SEM, 2020-2021) by
17 pages
ANCOVA
No ratings yet
ANCOVA
17 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
40 pages
Icch 210 Practice Final Exam
100% (1)
Icch 210 Practice Final Exam
8 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Methods For Studying Proteins
No ratings yet
Methods For Studying Proteins
96 pages
Solution CH # 5
No ratings yet
Solution CH # 5
39 pages
GE AKTAdesignPurification
100% (1)
GE AKTAdesignPurification
160 pages
Regression Formula
No ratings yet
Regression Formula
2 pages
1 - Text - A Guide To Characterizing Particle Size and Shape - 23AGO2020
No ratings yet
1 - Text - A Guide To Characterizing Particle Size and Shape - 23AGO2020
11 pages
Chi-Squared Analysis: AP Biology Unit 4
No ratings yet
Chi-Squared Analysis: AP Biology Unit 4
16 pages
BG3104 Set 1 2014-15
No ratings yet
BG3104 Set 1 2014-15
108 pages
MCAT Formula Sheet
No ratings yet
MCAT Formula Sheet
3 pages
Odds Ratio, Hazard Ratio and Relative Risk: Janez Stare Delphine Maucort-Boulch
No ratings yet
Odds Ratio, Hazard Ratio and Relative Risk: Janez Stare Delphine Maucort-Boulch
9 pages
I Llpo'Rtio"' Etr Rrlrripre 6R, A Seguwk': Voaable Oflod 4 Ksulfej FF 1-O-F) Z,'Hau
No ratings yet
I Llpo'Rtio"' Etr Rrlrripre 6R, A Seguwk': Voaable Oflod 4 Ksulfej FF 1-O-F) Z,'Hau
17 pages
Sarp Ankara Calculus İskenderun
100% (1)
Sarp Ankara Calculus İskenderun
2 pages
Multi Regression
No ratings yet
Multi Regression
17 pages
Chapter10 Sampling Two Stage Sampling
No ratings yet
Chapter10 Sampling Two Stage Sampling
21 pages
Kaplan-Meier Estimator: Association. The Journal Editor, John Tukey, Convinced Them To Combine Their
No ratings yet
Kaplan-Meier Estimator: Association. The Journal Editor, John Tukey, Convinced Them To Combine Their
7 pages
Statistical Estimations in Enzyme Kinetics: Investigation
No ratings yet
Statistical Estimations in Enzyme Kinetics: Investigation
9 pages
SI Units
No ratings yet
SI Units
4 pages
Prob Stat Lesson 9
No ratings yet
Prob Stat Lesson 9
44 pages
Probit Logit Indiana
No ratings yet
Probit Logit Indiana
62 pages
Regression
No ratings yet
Regression
46 pages
Data Visualization With Ma Thematic A
No ratings yet
Data Visualization With Ma Thematic A
46 pages
Common Statistical Tests
No ratings yet
Common Statistical Tests
14 pages
Statistics - Describing Data Numerical
No ratings yet
Statistics - Describing Data Numerical
56 pages
Dimensional Analysis: A Simple Example
No ratings yet
Dimensional Analysis: A Simple Example
10 pages
Modeling of Process Intensification
From Everand
Modeling of Process Intensification
Frerich J. Keil
No ratings yet
Chemometrics: Data Driven Extraction for Science
From Everand
Chemometrics: Data Driven Extraction for Science
Richard G. Brereton
No ratings yet
Poisson Distribution
No ratings yet
Poisson Distribution
22 pages
Survival Analysis and Interpretation Of.32
No ratings yet
Survival Analysis and Interpretation Of.32
7 pages
Theory and Practice of Chromatographic Techniques
From Everand
Theory and Practice of Chromatographic Techniques
Sanjay B. Bari
No ratings yet
ODE in Maple PDF
No ratings yet
ODE in Maple PDF
6 pages
BackPropogationCrossEntNotes PDF
No ratings yet
BackPropogationCrossEntNotes PDF
4 pages
Bioprocess Complete Self-Assessment Guide
From Everand
Bioprocess Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
CT4 Q&A Bank Part 1 Questions
No ratings yet
CT4 Q&A Bank Part 1 Questions
12 pages
Class 7
No ratings yet
Class 7
42 pages
Maximum Likelihood Estimates of Linear Dynamic Systems (1965)
No ratings yet
Maximum Likelihood Estimates of Linear Dynamic Systems (1965)
14 pages
Maple TA Test PDF
No ratings yet
Maple TA Test PDF
18 pages
IR - Spectros
No ratings yet
IR - Spectros
40 pages
Ising Model 2d
100% (1)
Ising Model 2d
11 pages
Cox Regression
No ratings yet
Cox Regression
51 pages
UserManual 2018 PDF
100% (1)
UserManual 2018 PDF
356 pages
Efficiency Improvements in High Speed Operation Using Slotless Configuration For PMSM
No ratings yet
Efficiency Improvements in High Speed Operation Using Slotless Configuration For PMSM
7 pages
Unit 1: Basic Usage of Maple: 1. Command Line, Names, Help
100% (1)
Unit 1: Basic Usage of Maple: 1. Command Line, Names, Help
19 pages
Biostatistics Lecture - 8 - Probability (Part - 2)
100% (1)
Biostatistics Lecture - 8 - Probability (Part - 2)
18 pages
Affinity Chromatography Vol 2. Tagged Proteins GEHealthcare
No ratings yet
Affinity Chromatography Vol 2. Tagged Proteins GEHealthcare
294 pages
Liquid Chromatography - Mass Spectrometry: An Introduction
From Everand
Liquid Chromatography - Mass Spectrometry: An Introduction
Robert E. Ardrey
No ratings yet
SIS Model For An Infectious Disease
No ratings yet
SIS Model For An Infectious Disease
3 pages
10E-Poisson Regression
No ratings yet
10E-Poisson Regression
19 pages
Data Admin 601 Syllabus Spring 2018
No ratings yet
Data Admin 601 Syllabus Spring 2018
6 pages
Pobability Exercise 1
No ratings yet
Pobability Exercise 1
2 pages
Data Driven Decision Hw2
No ratings yet
Data Driven Decision Hw2
2 pages
Admin 601-01 Midterm 2018筆記
No ratings yet
Admin 601-01 Midterm 2018筆記
2 pages
Final Exam SP '18
No ratings yet
Final Exam SP '18
6 pages
Assignment Cluster P209#3
No ratings yet
Assignment Cluster P209#3
1 page
Basics of Hypothesis Testing
No ratings yet
Basics of Hypothesis Testing
36 pages
Untitled Notebook
No ratings yet
Untitled Notebook
1 page
Sociology Lecture Notes
No ratings yet
Sociology Lecture Notes
5 pages
Basilan
No ratings yet
Basilan
279 pages
Summative Test in PR1
No ratings yet
Summative Test in PR1
2 pages
Tbi.b - (1) Research Assumption and Hypotheses
100% (2)
Tbi.b - (1) Research Assumption and Hypotheses
5 pages
Artikel
No ratings yet
Artikel
7 pages
Pharmaceutical Applications of GC
No ratings yet
Pharmaceutical Applications of GC
20 pages
SP Topan Naparin
No ratings yet
SP Topan Naparin
1 page
ECO-7 - ENG - Compressed
No ratings yet
ECO-7 - ENG - Compressed
3 pages
TOPIC 14: Teaching Plan 14B.3 Buffer Solutions and PH Curves
No ratings yet
TOPIC 14: Teaching Plan 14B.3 Buffer Solutions and PH Curves
3 pages
Stability Criterion For Hybrid Systems With Delay
No ratings yet
Stability Criterion For Hybrid Systems With Delay
5 pages
HHS Public Access: Characteristics of Qualitative Descriptive Studies: A Systematic Review
No ratings yet
HHS Public Access: Characteristics of Qualitative Descriptive Studies: A Systematic Review
24 pages
ECEA106L EXP2 Solving A System of Linear Equations PDF
No ratings yet
ECEA106L EXP2 Solving A System of Linear Equations PDF
4 pages
Psychological Statistics Midterm
No ratings yet
Psychological Statistics Midterm
29 pages
S1.State Variable Representation
No ratings yet
S1.State Variable Representation
5 pages
ILM Assessment Terminology PDF
No ratings yet
ILM Assessment Terminology PDF
2 pages
Development and Validation of A Simple UV Spectrophotometric Method For The Determination of Cefotaxime Sodium in Bulk and Pharmaceutical Formulation
No ratings yet
Development and Validation of A Simple UV Spectrophotometric Method For The Determination of Cefotaxime Sodium in Bulk and Pharmaceutical Formulation
4 pages
Syllabus Math+110.docx.-3
No ratings yet
Syllabus Math+110.docx.-3
4 pages
Exp 12-B-Sp 07
No ratings yet
Exp 12-B-Sp 07
7 pages
Assignment (Mppu 1034) Group Assignment (15%)
No ratings yet
Assignment (Mppu 1034) Group Assignment (15%)
1 page
Chapter 4 Measures of Variability PDF
No ratings yet
Chapter 4 Measures of Variability PDF
28 pages
Dec 2023 - Academic Guidebook Foundation Programme Session 2023 - 2024-1
No ratings yet
Dec 2023 - Academic Guidebook Foundation Programme Session 2023 - 2024-1
39 pages
Strengths and Weaknesses of Quantitative Research
No ratings yet
Strengths and Weaknesses of Quantitative Research
1 page
Question Paper Code
No ratings yet
Question Paper Code
3 pages
Domain and Range PP TX
No ratings yet
Domain and Range PP TX
8 pages
MTL411 - Functional Analysis: N P N I 1 P P N
No ratings yet
MTL411 - Functional Analysis: N P N I 1 P P N
1 page

Data Driven Decision Regression

Uploaded by

Data Driven Decision Regression

Uploaded by

Yi-Ting Chang

Data Driven Decision Making

 This simple linear regression is a positive slope.

The distribution is very close to a normal distribution, a bell shape. There’s a

I calculate the distance from the mean as below:

6.6568 ± .359411*2 = 5.9379~7.3756 (95% values lie between this range)

Y=36019.53-5893.08*Major Code+89.50296*Usefulness-2014.2*Gender code-2612.12*GPA+6107.477*Years

1. Recode the headings, correct the miss spellings.

2. Change the type of Default into Numeric Continuous.

7. Use Multivariate SVD Imputation to find all missing value.

You might also like

Y=36019.53-5893.08Major Code+89.50296Usefulness-2014.2Gender code-2612.12GPA+6107.477*Years