0% found this document useful (0 votes)

85 views

2.1 ML (Implementation of Simple Linear Regression in Python)

1. This document describes the implementation of a simple linear regression algorithm in Python to predict salary based on years of experience. It involves preprocessing the dataset, splitting it into training and test sets, fitting a linear regression model on the training set, making predictions on both training and test sets, and visualizing the results. 2. The key steps are preprocessing the dataset, fitting a linear regression model to the training set, predicting values for both training and test sets, and creating scatter plots to visualize the predictions against the actual values for both sets. 3. The visualization shows that most observations for both training and test sets are close to the regression line, indicating the simple linear regression model is able to make good predictions.

Uploaded by

Muhammad shayan umar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views

2.1 ML (Implementation of Simple Linear Regression in Python)

Uploaded by

Muhammad shayan umar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Implementation of Simple Linear Regression Algorithm

using Python
Problem Statement example for Simple Linear Regression:

Here we are taking a dataset that has two variables: salary (dependent variable) and
experience (Independent variable). The goals of this problem is:

o We want to find out if there is any correlation between these two variables
o We will find the best fit line for the dataset.
o How the dependent variable is changing by changing the dependent
variable.

In this section, we will create a Simple Linear Regression model to find out the best fitting
line for representing the relationship between these two variables.

To implement the Simple Linear regression model in machine learning using Python, we
need to follow the below steps:

Step-1: Data Pre-processing

The first step for creating the Simple Linear Regression model is data pre-processing. We
have already done it earlier in this tutorial. But there will be some changes, which are given
in the below steps:

o First, we will import the three important libraries, which will help us for loading the
dataset, plotting the graphs, and creating the Simple Linear Regression model.

1. import numpy as nm
2. import matplotlib.pyplot as mtp
3. import pandas as pd

o Next, we will load the dataset into our code:

data_set= pd.read_csv('Salary_Data.csv')

By executing the above line of code (ctrl+ENTER), we can read the dataset on our Spyder
IDE screen by clicking on the variable explorer option.
The above output shows the dataset, which has two variables: Salary and Experience.

Note: In Spyder IDE, the folder containing the code file must be saved as a working
directory, and the dataset or csv file should be in the same folder.
o After that, we need to extract the dependent and independent variables from the
given dataset. The independent variable is years of experience, and the dependent
variable is salary. Below is code for it:

1. x= data_set.iloc[:, :-1].values
2. y= data_set.iloc[:, 1].values

In the above lines of code, for x variable, we have taken -1 value since we want to remove
the last column from the dataset. For y variable, we have taken 1 value as a parameter,
since we want to extract the second column and indexing starts from the zero.

By executing the above line of code, we will get the output for X and Y variable as:
In the above output image, we can see the X (independent) variable and Y (dependent)
variable has been extracted from the given dataset.

o Next, we will split both variables into the test set and training set. We have 30
observations, so we will take 20 observations for the training set and 10
observations for the test set. We are splitting our dataset so that we can train our
model using a training dataset and then test the model using a test dataset. The
code for this is given below:

1. # Splitting the dataset into training and test set.

2. from sklearn.model_selection import train_test_split
3. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 1/3, random_state=0)

By executing the above code, we will get x-test, x-train and y-test, y-train dataset. Consider
the below images:

Test-dataset:
Training Dataset:
o For simple linear Regression, we will not use Feature Scaling. Because Python
libraries take care of it for some cases, so we don't need to perform it here. Now, our
dataset is well prepared to work on it and we are going to start building a Simple
Linear Regression model for the given problem.

Step-2: Fitting the Simple Linear Regression to the Training Set:

Now the second step is to fit our model to the training dataset. To do so, we will import
the LinearRegression class of the linear_model library from the scikit learn. After
importing the class, we are going to create an object of the class named as a regressor.
The code for this is given below:

1. #Fitting the Simple Linear Regression model to the training dataset

2. from sklearn.linear_model import LinearRegression
3. regressor= LinearRegression()
4. regressor.fit(x_train, y_train)

In the above code, we have used a fit() method to fit our Simple Linear Regression object
to the training set. In the fit() function, we have passed the x_train and y_train, which is
our training dataset for the dependent and an independent variable. We have fitted our
regressor object to the training set so that the model can easily learn the correlations
between the predictor and target variables. After executing the above lines of code, we will
get the below output.

Output:

Out[7]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,

normalize=False)

Step: 3. Prediction of test set result:

dependent (salary) and an independent variable (Experience). So, now, our model is ready
to predict the output for the new observations. In this step, we will provide the test dataset
(new observations) to the model to check whether it can predict the correct output or not.

We will create a prediction vector y_pred, and x_pred, which will contain predictions of
test dataset, and prediction of training set respectively.

1. #Prediction of Test and Training set result

2. y_pred= regressor.predict(x_test)
3. x_pred= regressor.predict(x_train)

On executing the above lines of code, two variables named y_pred and x_pred will generate
in the variable explorer options that contain salary predictions for the training set and test
set.

Output:
You can check the variable by clicking on the variable explorer option in the IDE, and also
compare the result by comparing values from y_pred and y_test. By comparing these
values, we can check how good our model is performing.

Step: 4. visualizing the Training set results:

Now in this step, we will visualize the training set result. To do so, we will use the scatter()
function of the pyplot library, which we have already imported in the pre-processing step.
The scatter () function will create a scatter plot of observations.

In the x-axis, we will plot the Years of Experience of employees and on the y-axis, salary of
employees. In the function, we will pass the real values of training set, which means a year
of experience x_train, training set of Salaries y_train, and color of the observations. Here
we are taking a green color for the observation, but it can be any color as per the choice.

Now, we need to plot the regression line, so for this, we will use the plot() function of the
pyplot library. In this function, we will pass the years of experience for training set,
predicted salary for training set x_pred, and color of the line.

Next, we will give the title for the plot. So here, we will use the title() function of
the pyplot library and pass the name ("Salary vs Experience (Training Dataset)".

After that, we will assign labels for x-axis and y-axis using xlabel() and ylabel()
function.

Finally, we will represent all above things in a graph using show(). The code is given below:

1. mtp.scatter(x_train, y_train, color="green")

2. mtp.plot(x_train, x_pred, color="red")
3. mtp.title("Salary vs Experience (Training Dataset)")
4. mtp.xlabel("Years of Experience")
5. mtp.ylabel("Salary(In Rupees)")
6. mtp.show()

Output:

By executing the above lines of code, we will get the below graph plot as an output.
In the above plot, we can see the real values observations in green dots and predicted
values are covered by the red regression line. The regression line shows a correlation
between the dependent and independent variable.

The good fit of the line can be observed by calculating the difference between actual values
and predicted values. But as we can see in the above plot, most of the observations are
close to the regression line, hence our model is good for the training set.

Step: 5. visualizing the Test set results:

In the previous step, we have visualized the performance of our model on the training set.
Now, we will do the same for the Test set. The complete code will remain the same as the
above code, except in this, we will use x_test, and y_test instead of x_train and y_train.

Here we are also changing the color of observations and regression line to differentiate
between the two plots, but it is optional.

1. #visualizing the Test set results

2. mtp.scatter(x_test, y_test, color="blue")
3. mtp.plot(x_train, x_pred, color="red")
4. mtp.title("Salary vs Experience (Test Dataset)")
5. mtp.xlabel("Years of Experience")
6. mtp.ylabel("Salary(In Rupees)")
7. mtp.show()
Output:

By executing the above line of code, we will get the output as:

In the above plot, there are observations given by the blue color, and prediction is given by
the red regression line. As we can see, most of the observations are close to the regression
line, hence we can say our Simple Linear Regression is a good model and able to make good
predictions.

CQE Study Questions, Answers and Solutions
96% (23)
CQE Study Questions, Answers and Solutions
108 pages
Syllabus - Statistical Analysis With Software Application
100% (7)
Syllabus - Statistical Analysis With Software Application
4 pages
Auditdysfunctional
No ratings yet
Auditdysfunctional
18 pages
Slovin Formula
50% (2)
Slovin Formula
3 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
No ratings yet
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
11 pages
ML Experiment No 1 Linear Regression Analysis
No ratings yet
ML Experiment No 1 Linear Regression Analysis
3 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
EXP-4 DMusingPYTHON
No ratings yet
EXP-4 DMusingPYTHON
7 pages
Regression
No ratings yet
Regression
16 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
Exp 1
No ratings yet
Exp 1
6 pages
CSL0777 L15
No ratings yet
CSL0777 L15
24 pages
Simple Linear Regression Lab II
No ratings yet
Simple Linear Regression Lab II
5 pages
Praktikum 1 Jupiter Machine Learning
No ratings yet
Praktikum 1 Jupiter Machine Learning
1 page
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
ML manoj
No ratings yet
ML manoj
51 pages
Simple Linear Regression Code
No ratings yet
Simple Linear Regression Code
3 pages
lab mannual of ML
No ratings yet
lab mannual of ML
43 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
Simple - Linear - Regression - Ipynb - Colaboratory
No ratings yet
Simple - Linear - Regression - Ipynb - Colaboratory
2 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
ml_6_7_8 (1)
No ratings yet
ml_6_7_8 (1)
10 pages
Unit5 - Linear Regression
No ratings yet
Unit5 - Linear Regression
4 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
132 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Task1
No ratings yet
Task1
5 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
2.3 ML (Implementation of Polynomial Regression Using Python)
No ratings yet
2.3 ML (Implementation of Polynomial Regression Using Python)
9 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
ML Activity Kalyan
No ratings yet
ML Activity Kalyan
21 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
Supervised Learning For Data Science...
No ratings yet
Supervised Learning For Data Science...
14 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
Web II & DA Slip Solution
No ratings yet
Web II & DA Slip Solution
40 pages
Question 1 B
No ratings yet
Question 1 B
6 pages
Simple Linear Regression: Math Behind
No ratings yet
Simple Linear Regression: Math Behind
6 pages
ML Lab 07
No ratings yet
ML Lab 07
4 pages
hemraj_python_ass1
No ratings yet
hemraj_python_ass1
7 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Task8
No ratings yet
Task8
2 pages
EXPERIMENT2
No ratings yet
EXPERIMENT2
2 pages
Machine Learning With Python Algorithms
No ratings yet
Machine Learning With Python Algorithms
28 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Using A Dataset, Apply The Concept of Liner Regression
No ratings yet
Using A Dataset, Apply The Concept of Liner Regression
3 pages
ML_recordjp
No ratings yet
ML_recordjp
35 pages
EXP 2 ML
No ratings yet
EXP 2 ML
4 pages
python 1
No ratings yet
python 1
3 pages
Regression Demo
No ratings yet
Regression Demo
8 pages
Btech1007022_lab5.1
No ratings yet
Btech1007022_lab5.1
9 pages
Simple Linear Regression - Assign4
No ratings yet
Simple Linear Regression - Assign4
8 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
Unit 5
No ratings yet
Unit 5
171 pages
vertopal.com_22644501_lab02 (4)
No ratings yet
vertopal.com_22644501_lab02 (4)
14 pages
Machine Learning Algorithm With Python Implementation
No ratings yet
Machine Learning Algorithm With Python Implementation
34 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Century National Bank Analysis
No ratings yet
Century National Bank Analysis
6 pages
Assignment 4 Area Under The Stanard Normal Curve
No ratings yet
Assignment 4 Area Under The Stanard Normal Curve
7 pages
Task-by-Task-Guide_-Build-and-deploy-a-stroke-prediction-model-using-R
No ratings yet
Task-by-Task-Guide_-Build-and-deploy-a-stroke-prediction-model-using-R
5 pages
Lestari & Lestari. 2022. Analisis Pengaruh Penyelesaian Tindak Lanjut
No ratings yet
Lestari & Lestari. 2022. Analisis Pengaruh Penyelesaian Tindak Lanjut
13 pages
Where Can Buy Spatio Temporal Methods in Environmental Epidemiology With R 2nd Edition Gavin Shaddick Ebook With Cheap Price
100% (18)
Where Can Buy Spatio Temporal Methods in Environmental Epidemiology With R 2nd Edition Gavin Shaddick Ebook With Cheap Price
84 pages
InferentialStats SPSS
No ratings yet
InferentialStats SPSS
14 pages
Pengaruh Alat Penyajian Disposableterhadap Sisa Makanan Pasien Di Ruang Rawat Inap Rsup Dr. Kariadi Semarang
No ratings yet
Pengaruh Alat Penyajian Disposableterhadap Sisa Makanan Pasien Di Ruang Rawat Inap Rsup Dr. Kariadi Semarang
9 pages
Five Number Summary Worksheet 1
No ratings yet
Five Number Summary Worksheet 1
2 pages
Chi-Square Questions - Biostatistics
No ratings yet
Chi-Square Questions - Biostatistics
10 pages
Pre Test
100% (5)
Pre Test
2 pages
Download Understanding Statistics in the Behavioral Sciences Roger Bakeman ebook All Chapters PDF
100% (6)
Download Understanding Statistics in the Behavioral Sciences Roger Bakeman ebook All Chapters PDF
67 pages
Chapter 9 - Correlation and Regression
No ratings yet
Chapter 9 - Correlation and Regression
112 pages
Course Outline PDF
No ratings yet
Course Outline PDF
12 pages
Grade 7 q4 Summative Assessment
No ratings yet
Grade 7 q4 Summative Assessment
6 pages
Media Laboratorium Virtual Pada Pembelajaran Fisika Di Era Pandemi Covid-19 Terhadap Keterampilan Proses Sains Siswa
No ratings yet
Media Laboratorium Virtual Pada Pembelajaran Fisika Di Era Pandemi Covid-19 Terhadap Keterampilan Proses Sains Siswa
8 pages
6 Measure of Central Tendency
100% (1)
6 Measure of Central Tendency
60 pages
Analysis of Variance (Anova) F-Test: C H A P T E R 9
No ratings yet
Analysis of Variance (Anova) F-Test: C H A P T E R 9
26 pages
Lecture 14
No ratings yet
Lecture 14
21 pages
Activity 8
No ratings yet
Activity 8
5 pages
CHP Eight Statistic
No ratings yet
CHP Eight Statistic
17 pages
Exercise 5 - MMW Statistics - For Asynch
No ratings yet
Exercise 5 - MMW Statistics - For Asynch
18 pages
Sae: An R Package For Small Area Estimation
No ratings yet
Sae: An R Package For Small Area Estimation
18 pages
Spss Notes
No ratings yet
Spss Notes
16 pages
Math Mark Scheme1
No ratings yet
Math Mark Scheme1
24 pages
Applied Linear Regression for Business Analytics with R A Practical Guide to Data Science with Case Studie 1st Edition by Daniel McGibney ISBN 9783031214806 3031214803 - The special ebook edition is available for download now
100% (8)
Applied Linear Regression for Business Analytics with R A Practical Guide to Data Science with Case Studie 1st Edition by Daniel McGibney ISBN 9783031214806 3031214803 - The special ebook edition is available for download now
81 pages
Final Chi Square
No ratings yet
Final Chi Square
22 pages