0% found this document useful (0 votes)

20 views

Linear Regression

The document discusses linear regression. It imports necessary libraries and loads a housing dataset. It separates the data into dependent and independent variables, splits the data into training and testing sets, fits a linear regression model on the training set, makes predictions on the testing set, and evaluates the model performance. It also checks the assumptions of linear regression like linearity, homoscedasticity and normality of residuals through visualizations.

Uploaded by

Nipuni

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Linear Regression

Uploaded by

Nipuni

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

linear-regression

March 24, 2024

0.1 Importing needful libraries and modules

[1]: import pandas as pd
from sklearn.model_selection import train_test_split #For splitting the data␣
↪into train & test

from sklearn.linear_model import LinearRegression #Linear regression model

from sklearn.metrics import mean_squared_error #Metric for regression MSE
import statsmodels.api as sm #Regression Model Summary
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.stats.stattools import durbin_watson

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import norm
import scipy as sp
import statsmodels.tsa.api as smt

[22]: #loading the dataset

data=pd.read_csv("C:/Users/ramaleer/Desktop/Practical 2/Datasets/Boston.CSV")

#get the first 5 rows for last 5 rows use data.tail()

data.head()

[22]: crim zn indus nox rm age dis rad tax ptratio \

0 0.00632 18.0 2.31 0.538 6.575 65.2 4.0900 1 296 15.3
1 0.02731 0.0 7.07 0.469 6.421 78.9 4.9671 2 242 17.8
2 0.02729 0.0 7.07 0.469 7.185 61.1 4.9671 2 242 17.8
3 0.03237 0.0 2.18 0.458 6.998 45.8 6.0622 3 222 18.7
4 0.06905 0.0 2.18 0.458 7.147 54.2 6.0622 3 222 18.7

black lstat medv

0 396.90 4.98 24.0
1 396.90 9.14 21.6
2 392.83 4.03 34.7
3 394.63 2.94 33.4
4 396.90 5.33 36.2

1
Variable Description
• CRIM: Per capita crime rate by town
• ZN: Proportion of residential land zoned for lots over 25,000 sq.ft.
• INDUS: Proportion of non-retail business acres per town.
• NOX: Nitric oxides concentration (parts per 10 million)
• RM: Average number of rooms per dwelling
• AGE: Proportion of owner-occupied units built prior to 1940
• DIS: Weighted distances to five Boston employment centers
• RAD: Index of accessibility to radial highways
• TAX: Full-value property-tax rate per 10,000 dollars
• PTRATIO: Pupil-teacher ratio by town
• BLACK: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
• LSTAT: percentage of lower status of the population
• MEDV: Median value of owner-occupied homes in 1,000’s dollars
Target Variable is Medv which is the House Price
[3]: data.shape

[3]: (506, 13)

[4]: #get the column and row count

print("Columns:",data.shape[1])
print("Rows:",data.shape[0])

Columns: 13
Rows: 506

1 Seperating independent data matrix & response vector

[35]: #define x, y
x = data.drop(columns = ['medv'], axis=1) #independent variables
y = data.medv #target vaiable

[ ]:

2 Splitting data into training & testing sets (Validation set ap-
proach)
[6]: # where training set is 20% of actual data
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0)

2
3 Creating a linear regression model object
[7]: model=LinearRegression()

4 Training the model with training data

[8]: model.fit(x_train,y_train)

[8]: LinearRegression()

5 Estimated model parameters for independent variables

[9]: #coefficients of the independent variables
model.coef_

[9]: array([-1.25156811e-01, 4.48575121e-02, 2.01265385e-02, -1.62895522e+01,

3.77366409e+00, -2.13185120e-03, -1.40611846e+00, 2.66835482e-01,
-1.21692284e-02, -1.07660161e+00, 8.72533635e-03, -4.94425348e-01])

6 Intercept of the model

[10]: # beta-0 value
model.intercept_

[10]: 38.42706211257645

7 R-Squared value for the trained model

[11]: model.score(x_train,y_train)*100

[11]: 76.90953567794605

8 Predicting the response for the unseen testing independent data

[12]: y_pred=model.predict(x_test)

3
9 Mean Squared Error for the testing data
[13]: MSE=mean_squared_error(y_pred,y_test)
MSE

[13]: 34.21225254753325

10 Root Mean Squared Error

[14]: np.sqrt(MSE)

[14]: 5.84912408378667

11 Regression Model Summary using Statsmodels

[15]: #loading the dataset
data=pd.read_csv("C:/Users/ramaleer/Desktop/Practical 2/Datasets/advertising.
↪CSV")

#get the first 5 rows for last 5 rows use data.tail()

data.head()

[15]: TV Radio Newspaper Sales

0 230.1 37.8 69.2 22.1
1 44.5 39.3 45.1 10.4
2 17.2 45.9 69.3 12.0
3 151.5 41.3 58.5 16.5
4 180.8 10.8 58.4 17.9

[16]: #define x, y
x = data.drop(columns = ['Sales'], axis=1) #independent variables
y = data.Sales #target vaiable

[17]: #add constant to predictor variables

x = sm.add_constant(x)

#fit linear regression model

modelreg = sm.OLS(y, x).fit()

#view model summary

print(modelreg.summary())

data.shape

OLS Regression Results

4
==============================================================================
Dep. Variable: Sales R-squared: 0.903
Model: OLS Adj. R-squared: 0.901
Method: Least Squares F-statistic: 605.4
Date: Sun, 21 Jan 2024 Prob (F-statistic): 8.13e-99
Time: 15:17:22 Log-Likelihood: -383.34
No. Observations: 200 AIC: 774.7
Df Residuals: 196 BIC: 787.9
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 4.6251 0.308 15.041 0.000 4.019 5.232
TV 0.0544 0.001 39.592 0.000 0.052 0.057
Radio 0.1070 0.008 12.604 0.000 0.090 0.124
Newspaper 0.0003 0.006 0.058 0.954 -0.011 0.012
==============================================================================
Omnibus: 16.081 Durbin-Watson: 2.251
Prob(Omnibus): 0.000 Jarque-Bera (JB): 27.655
Skew: -0.431 Prob(JB): 9.88e-07
Kurtosis: 4.605 Cond. No. 454.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.

[17]: (200, 4)

12 Linear Regression- Assumption Checking

1. Linearity
2. Homoscedasticity
3. Multivariate Normality
4. Independence
5. Lack of Multicollinearity

12.1 1. Linearity
[23]: fig,(ax1, ax2, ax3) = plt.subplots(nrows=3, figsize=(5, 12))

ax1.scatter(data['nox'],data['medv'])
ax1.set_title('nox-Nitric oxides concentration - parts per 10 million')
ax2.scatter(data['rm'],data['medv'])
ax2.set_title('rm-Average number of rooms per dwelling')
ax3.scatter(data['age'],data['medv'])

5
ax3.set_title('age-Proportion of owner-occupied units built prior to 1940')

plt.show()

6
7
12.2 2. Homoscedasticity
[24]: # Calculating residuals from the plot
residuals = y_test - y_pred

# Calculating standardized residuals

standardized_residuals = residuals / np.std(residuals)

[25]: # having Equal Variance

# We can check this by using a scatter plot, where the x-axis will have the␣
↪predictions,

# and the y-axis will have the residuals

plt.scatter(y_pred, standardized_residuals)
plt.xlabel("Predicted values")
plt.ylabel("Standardized Residuals")
plt.title("Residuals vs Fitted Values")
plt.axhline(y=0, color='r', linestyle='--') # Add a horizontal line at y=0 for␣
↪reference

plt.show()

8
12.3 3. Multivariate Normality
Normality of Residuals

12.4 a. Using a Histogram

[26]: # Fit a normal distribution to
# the residuals:
# mean and standard deviation
mu, std = norm.fit(residuals)

# Plot the histogram.

plt.hist(residuals, bins=30, density=True, alpha=0.6, color='b')

# Plot the PDF.

xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)

plt.plot(x, p, 'k', linewidth=2)

9
plt.show()

12.5 b. QQ Plot- Quantile-Quantile Plot

[27]: fig, ax =plt.subplots(figsize=(6,4))
sp.stats.probplot(residuals,plot=ax,fit=True)

plt.show()

10
[28]: # homework : apply the tests Kolmogorov-Smirnov test, Shapiro–Wilk test to␣
↪check multivariate normality.

12.6 4. Independence of observations -No autocorrelation of errors(residuals)

12.7 Durbin-Watson test
[29]: # test in graphical format
acf = smt.graphics.plot_acf(residuals, lags=40 , alpha=0.05)
acf.show()

C:\Users\ramaleer\AppData\Local\Temp\ipykernel_9772\3320808677.py:3:
UserWarning: Matplotlib is currently using
module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot
show the figure.
acf.show()

11
[30]: #perform Durbin-Watson test

durbin_watson(residuals)

[30]: 2.0260473760330022

Some notes on the Durbin-Watson test:

• the test statistic always has a value between 0 and 4
• value of 2 means that there is no autocorrelation in the sample
• values < 2 indicate positive autocorrelation, values > 2 negative autocorrelation

12.8 5. Lack of Multicollinearity

12.9 a.VIF -Variance Inflation Factor
[31]: vif=[]

for i in range(x_train.shape[1]):
vif.append(variance_inflation_factor(x_train,i))

pd.DataFrame({'Vif':vif}, index=data.columns[0:12])

12
[31]: Vif
crim 2.095894
zn 2.928062
indus 13.829768
nox 80.580602
rm 80.295207
age 22.821527
dis 14.784871
rad 14.694806
tax 57.284635
ptratio 87.191073
black 21.647351
lstat 11.319795

We can find the degree of correlation with the help of Variation Inflation Factor(VIF) It can be
interpreted as :
1 = Not correlated
1–5 = Moderately correlated
greater than 5 = Highly correlated

12.10 b. Heatmap Method -Correlation Matrix

[32]: fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(data.iloc[:,0:12].corr(),annot=True,linewidths=.5, ax=ax)

[32]: <AxesSubplot:>

13
[33]: plt.scatter(data["rad"],data["tax"])
plt.xlabel("rad")
plt.ylabel("tax")
plt.show()

14
[34]: # Homework : Use Other Regression Methods to come up with predictive models for␣
↪mdev: House price

# (explore the scikit-learn library in python)

[ ]:

Vertical Vapor Liquid Separator Sizing: Chemical Engineering Calculations
No ratings yet
Vertical Vapor Liquid Separator Sizing: Chemical Engineering Calculations
8 pages
The Philippines, A Singular and A Plural Place PDF
No ratings yet
The Philippines, A Singular and A Plural Place PDF
184 pages
Data Source Book P-7475H
No ratings yet
Data Source Book P-7475H
51 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
MachineLearning
No ratings yet
MachineLearning
10 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Unit 5
No ratings yet
Unit 5
171 pages
PythonFile[1]
No ratings yet
PythonFile[1]
5 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
data_analytucs_1[1]
No ratings yet
data_analytucs_1[1]
5 pages
ML File
No ratings yet
ML File
37 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Simple Linear Regression With Jupyter Notebook: Dr. Alvin Ang
No ratings yet
Simple Linear Regression With Jupyter Notebook: Dr. Alvin Ang
16 pages
Regression Model
No ratings yet
Regression Model
6 pages
SMEC ML LAB MANUAL R22
No ratings yet
SMEC ML LAB MANUAL R22
21 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Exp_6-Model Development_sdk_ok
No ratings yet
Exp_6-Model Development_sdk_ok
11 pages
ML manoj
No ratings yet
ML manoj
51 pages
Continuous Assessment
No ratings yet
Continuous Assessment
4 pages
vertopal.com_Lab_Linear_Regression
No ratings yet
vertopal.com_Lab_Linear_Regression
21 pages
Linear Regression Mca Lab - Jupyter Notebook
No ratings yet
Linear Regression Mca Lab - Jupyter Notebook
2 pages
Assignment7
No ratings yet
Assignment7
4 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
AI Lab9
No ratings yet
AI Lab9
5 pages
SML - Week 3
No ratings yet
SML - Week 3
5 pages
T2_summary_VHA
No ratings yet
T2_summary_VHA
14 pages
ml
No ratings yet
ml
17 pages
CS 611 Slides 4
No ratings yet
CS 611 Slides 4
25 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Lab Experiments Vi Sem-1
No ratings yet
Lab Experiments Vi Sem-1
10 pages
An Introduction to Stadistical Learning-129-140-1-8
No ratings yet
An Introduction to Stadistical Learning-129-140-1-8
8 pages
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
No ratings yet
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
4 pages
ML Remaining
No ratings yet
ML Remaining
17 pages
Linear Regression - Cheatsheet
No ratings yet
Linear Regression - Cheatsheet
8 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
ICT Assignment 2
No ratings yet
ICT Assignment 2
7 pages
lab ML
No ratings yet
lab ML
26 pages
Lab 11,12 - Copy
No ratings yet
Lab 11,12 - Copy
7 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Data Science Record_05
No ratings yet
Data Science Record_05
20 pages
hemraj_python_ass1
No ratings yet
hemraj_python_ass1
7 pages
message (2)
No ratings yet
message (2)
5 pages
Int AI TW-PW 03
No ratings yet
Int AI TW-PW 03
4 pages
Assignment No.4 - (20-Ele-68)
No ratings yet
Assignment No.4 - (20-Ele-68)
17 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Argha's ML LAB_240927_121838
No ratings yet
Argha's ML LAB_240927_121838
13 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Assignment 2 - LP1
No ratings yet
Assignment 2 - LP1
7 pages
lab mannual of ML
No ratings yet
lab mannual of ML
43 pages
ML Regression Documentation
No ratings yet
ML Regression Documentation
7 pages
Data analytics
No ratings yet
Data analytics
10 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
ML Unit
No ratings yet
ML Unit
23 pages
Unit II - Diagnotis and Multiple Linear
No ratings yet
Unit II - Diagnotis and Multiple Linear
8 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
Degradation Processes in Reliability
From Everand
Degradation Processes in Reliability
Waltraud Kahle
No ratings yet
Statistical Data Cleaning with Applications in R
From Everand
Statistical Data Cleaning with Applications in R
Mark van der Loo
No ratings yet
Data Cleaning and Pre Processing 1
No ratings yet
Data Cleaning and Pre Processing 1
12 pages
Text Processing
No ratings yet
Text Processing
16 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Apache Storm
No ratings yet
Apache Storm
29 pages
The Literary Translator and The Concept of Fidelity
No ratings yet
The Literary Translator and The Concept of Fidelity
16 pages
ECG353-chapter 2 - Vertical Stress Distribution - Part 2
No ratings yet
ECG353-chapter 2 - Vertical Stress Distribution - Part 2
21 pages
G Courage
No ratings yet
G Courage
12 pages
PCS 7 APACS OS Symbols and Faceplates V6.1
No ratings yet
PCS 7 APACS OS Symbols and Faceplates V6.1
66 pages
Sealability of Sheet, Composite, and Solid Form-in-Place Gasket Materials
No ratings yet
Sealability of Sheet, Composite, and Solid Form-in-Place Gasket Materials
6 pages
Casting Cover Letter
100% (1)
Casting Cover Letter
7 pages
Beginner S1
No ratings yet
Beginner S1
233 pages
An Explicit Equation For Friction Factor in Pipe
No ratings yet
An Explicit Equation For Friction Factor in Pipe
2 pages
MQ131 PDF
No ratings yet
MQ131 PDF
3 pages
BWG Isolierstoss en
No ratings yet
BWG Isolierstoss en
4 pages
BP404 Pharmacology - I Assignment-1
No ratings yet
BP404 Pharmacology - I Assignment-1
1 page
Department of Education: Philippine Contemporary Arts in The Region Quarter3-Week3-4
No ratings yet
Department of Education: Philippine Contemporary Arts in The Region Quarter3-Week3-4
6 pages
Construction Technology II Lesson 11
No ratings yet
Construction Technology II Lesson 11
99 pages
M 2 3 Geom - Mod - HCC
No ratings yet
M 2 3 Geom - Mod - HCC
30 pages
Design Requirements.: Design of Pre Engineered Buildings
No ratings yet
Design Requirements.: Design of Pre Engineered Buildings
2 pages
Items Net Sales Gender Age Custome R Type of Customer Method of Payment Marital Status
No ratings yet
Items Net Sales Gender Age Custome R Type of Customer Method of Payment Marital Status
5 pages
Chapter 7 Integrals
No ratings yet
Chapter 7 Integrals
37 pages
GSF
No ratings yet
GSF
2 pages
Unit-7. Watewater Treatment Method
No ratings yet
Unit-7. Watewater Treatment Method
108 pages
Phy 2 Final (1)
No ratings yet
Phy 2 Final (1)
2 pages
Knee Rehab Self Assessment Checklist 2
100% (1)
Knee Rehab Self Assessment Checklist 2
32 pages
Lec 2 Terms in Periodontology
No ratings yet
Lec 2 Terms in Periodontology
13 pages
BS 476-3-1975 - Fire Test On Building Materials & Structures - Part 3 - External Fire Exposure Roof Tests
No ratings yet
BS 476-3-1975 - Fire Test On Building Materials & Structures - Part 3 - External Fire Exposure Roof Tests
14 pages
Clincal Guidelines For Genetics Services 2021
No ratings yet
Clincal Guidelines For Genetics Services 2021
73 pages
IFMS West Bengal Brochure
100% (2)
IFMS West Bengal Brochure
36 pages
Western Civilization 9th Edition Jackson J. Spielvogel - Read the ebook online or download it as you prefer
100% (1)
Western Civilization 9th Edition Jackson J. Spielvogel - Read the ebook online or download it as you prefer
52 pages
AWS Solutions Architect Associate Study Plan
No ratings yet
AWS Solutions Architect Associate Study Plan
5 pages