Unit 2 Regression Analysis
Unit 2 Regression Analysis
Calculate the regression coefficient and obtain the lines of regression for the following data
Solution:
Regression coefficient of X on Y
Y = 0.929X–3.716+11
= 0.929X+7.284
The regression equation of Y on X is Y= 0.929X + 7.284
alculate the two regression equations of X on Y and Y on X from the data given below, taking
deviations from a actual means of X and Y.
Solution:
Calculation of Regression equation
= –0.25 (20)+44.25
= –5+44.25
= 39.25 (when the price is Rs. 20, the likely demand is 39.25)
3.
Obtain regression equation of Y on X and estimate Y when X=55 from the following
Solution:
Y–51.57 = 0.942(X–48.29 )
Y = 0.942X–45.49+51.57=0.942 #–45.49+51.57
Y = 0.942X+6.08
Y= 0.942(55)+6.08=57.89
Example 9.12
Find the means of X and Y variables and the coefficient of correlation between them from the
following two regression equations:
2Y–X–50 = 0
3Y–2X–10 = 0.
Solution:
We are given
We get Y = 90
Putting the value of Y in equation (1)
We get X = 130
2Y = X+50
Linear Regression
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the linear
relationship, which means it finds how the value of the dependent variable is changing according to
the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship between the
variables. Consider the below image:
Mathematically, we can represent a linear regression as:
y= a0+a1x+ ε
Here,
The values for x and y variables are training datasets for Linear Regression model representation.
Linear regression can be further divided into two types of the algorithm:
1. import numpy as nm
3. import pandas as pd
1. data_set= pd.read_csv('Salary_Data.csv')
o After that, we need to extract the dependent and independent variables from the given
dataset. The independent variable is years of experience, and the dependent variable is
salary. Below is code for it:
1. x= data_set.iloc[:, :-1].values
2. y= data_set.iloc[:, 1].values
In the above output image, we can see the X (independent) variable and Y (dependent) variable has
been extracted from the given dataset.
o Next, we will split both variables into the test set and training set. We have 30 observations,
so we will take 20 observations for the training set and 10 observations for the test set. We
are splitting our dataset so that we can train our model using a training dataset and then test
the model using a test dataset. The code for this is given below:
By executing the above code, we will get x-test, x-train and y-test, y-train dataset. Consider the below
images:
Test-dataset:
Training Dataset:
Step-2: Fitting the Simple Linear Regression to the Training Set:
Now the second step is to fit our model to the training dataset. To do so, we will import
the LinearRegression class of the linear_model library from the scikit learn. After importing the class,
we are going to create an object of the class named as a regressor. The code for this is given below:
3. regressor= LinearRegression()
4. regressor.fit(x_train, y_train)
5. Output:
dependent (salary) and an independent variable (Experience). So, now, our model is ready to predict
the output for the new observations. In this step, we will provide the test dataset (new observations)
to the model to check whether it can predict the correct output or not.
We will create a prediction vector y_pred, and x_pred, which will contain predictions of test dataset,
and prediction of training set respectively.
2. y_pred= regressor.predict(x_test)
3. x_pred= regressor.predict(x_train)
Output:
You can check the variable by clicking on the variable explorer option in the IDE, and also compare
the result by comparing values from y_pred and y_test. By comparing these values, we can check
how good our model is performing.
4. mtp.xlabel("Years of Experience")
5. mtp.ylabel("Salary(In Rupees)")
6. mtp.show()
Output:
By executing the above lines of code, we will get the below graph plot as an output.
In the previous step, we have visualized the performance of our model on the training set. Now, we
will do the same for the Test set. The complete code will remain the same as the above code, except
in this, we will use x_test, and y_test instead of x_train and y_train.
Here we are also changing the color of observations and regression line to differentiate between the
two plots, but it is optional.
5. mtp.xlabel("Years of Experience")
6. mtp.ylabel("Salary(In Rupees)")
7. mtp.show()
Output:
By executing the above line of code, we will get the output as:
Suppose we have the following dataset with one response variable y and two predictor variables
X1 and X2:
Use the following steps to fit a multiple linear regression model to this dataset.
Here is how to interpret this estimated linear regression equation: ŷ = -6.867 + 3.148x1 – 1.656x2
b0 = -6.867. When both predictor variables are equal to zero, the mean value for y is -6.867.
b1 = 3.148. A one unit increase in x1 is associated with a 3.148 unit increase in y, on average, assuming
x2 is held constant.
b2 = -1.656. A one unit increase in x2 is associated with a 1.656 unit decrease in y, on average,
assuming x1 is held constant.