Advanced - Linear Regression
Advanced - Linear Regression
Residual – difference
between the actual value
and the value on the line
rm and lstat
Randomly split the data into two :
Training and validation dataset
(Imp for predictive analysis)
Approx: 70% - to train
30 % - to test
Install caTools for split function
To split the dataset
Against every record
we have TRUE/FALSE
65% - TRUE
35% - False
Train and test the
dataset
Use == TRUE
Use
Use. For
. Forallallvariables
variables
Linear Equation =
36.459488 + (-0.10)* crim +
0.04*zn + .....
Compare the
Predicted
output with
the original
one
Predict command
Case Study - 2
• A company is facing high churn_rate this year,
and they are in process to find out the reasons
behind it. Salary_Hike being the major reason,
Let us consider a company’s data where we try
to find out the relationship between these
two variables.
• Linear Regression is a powerful technique used for
predicting the unknown value of a variable
(Dependent variable) from the known value of
another variables (Independent variables).
– A Dependent variable is the variable to be predicted
or explained in a regression model. This variable is
assumed to be assumed to be functionality related to
the independent variable.
– An independent variable is the variable related to the
dependent variable in a regression equation. The
independent variable is used in a Regression Model to
estimate the value of the dependent variable.
� In Case Study : Dependent Variable – Churn_data
Independent Variable – Salary_Hike
Equation of the Regression Line