0% found this document useful (0 votes)

41 views

Simple Linear Regression: Math Behind

1. Simple linear regression is an algorithm that finds the linear relationship between input variables (X) and output variables (Y). 2. It assumes a linear relationship between X and Y, and can be used to predict continuous values. 3. Implementing simple linear regression involves calculating the slope (b1 coefficient) and y-intercept (b0 coefficient) based on the input data to predict output values. 4. The document demonstrates how to implement simple linear regression from scratch in Python and compares it to Scikit-Learn's linear regression, finding very similar results.

Uploaded by

Derek Degbedzui

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Simple Linear Regression: Math Behind

Uploaded by

Derek Degbedzui

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1.

Simple Linear Regression

● Linear regression is the simplest machine learning algorithm you'll encounter
○ Especially simple linear regression
● It is a simple algorithm initially developed in the field of statistics and was studied as
a model for understanding the relationship between input and output variables
● It is a linear model - assumes a linear relationship between input variables (X) and
the output variable (y)
● Used to predict continuous values (e.g., weight, price...)

Simple vs. Multiple linear regression

● Simple linear regression solves problems with only one input feature
● Multiple linear regression solves problems with multiple input features

Assumptions

1. Linear Assumption — model assumes the relationship between variables is linear

2. No Noise — model assumes that the input and output variables are not noisy — so
remove outliers if possible
3. No Collinearity — model will overfit when you have highly correlated input
variables
4. Normal Distribution — the model will make more reliable predictions if your input
and output variables are normally distributed. If that’s not the case, try using some
transforms on your variables to make them more normal-looking
5. Rescaled Inputs — use scalers or normalizer to make more reliable predictions

Take-home point

● Training a simple linear regression model is as simple as solving a couple of

equations

Math behind
● In a nutshell, simple linear regression is based on coefficients - and which you need
to find in order to solve a line equation:

Line equation:

● The coefficient has to be calculated first

● It tells you the slope of the line

B1 coefficient:

● The coefficient relies on the slope

● It represents Y-intercept - location at which the line intercepts the Y-axis

B0 coefficient:

● Let's implement simple linear regression with pure Numpy next

Implementation

● You'll need only Numpy to implement the logic

● Matplotlib is used for optional visualizations
In [1]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams['figure.figsize'] = (14, 7)
rcParams['axes.spines.top'] = False
rcParams['axes.spines.right'] = False

● The SimpleLinearRegression class is written to follow the familiar Scikit-Learn

syntax
● The coefficients are set to None at the start - __init__() method
● The fit() method calculates the coefficients
● The predict() method essentially implements the line equation
○ Before it does so, it makes sure the coefficients have been
calculated
In [2]:
class SimpleLinearRegression:
'''
A class which implements simple linear regression model.
'''
def __init__(self):
self.b0 = None
self.b1 = None

def fit(self, X, y):

'''
Used to calculate slope and intercept coefficients.

:param X: array, single feature

:param y: array, true values
:return: None
'''
numerator = np.sum((X - np.mean(X)) * (y - np.mean(y)))
denominator = np.sum((X - np.mean(X)) ** 2)
self.b1 = numerator / denominator
self.b0 = np.mean(y) - self.b1 * np.mean(X)

def predict(self, X):

'''
Makes predictions using the simple line equation.

:param X: array, single feature

:return: None
'''
if not self.b0 or not self.b1:
raise Exception('Please call `SimpleLinearRegression.fit(X, y)` before making predictions.')
return self.b0 + self.b1 * X

Testing

● Let's create some dummy data

○ X contains a list of numbers between 1 and 300 (1, 2, 3, ..., 299,
300)
○ y contains normally distributed values centered around X with
standard deviation of 20
● The source data is then visualized:
In [13]:
X = np.arange(start=1, stop=301)
y = np.random.normal(loc=X, scale=20)

plt.scatter(X, y, s=200, c='#087E8B', alpha=0.65)

plt.title('Source dataset', size=20)
plt.xlabel('X', size=14)
plt.ylabel('Y', size=14)
plt.show()

● For validation sake, we'll split the dataset into training and testing parts:
In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

● You can now initialize and train the model, and afterwards make predictions:
In [5]:
model = SimpleLinearRegression()
model.fit(X_train, y_train)
preds = model.predict(X_test)

● Here's how you can get the coefficients:

In [6]:
model.b0, model.b1

● These are the predictions:

In [7]:
preds

● And these are the original values

● Original and predicted differ, but not much
In [8]:
y_test

● You can now evaluate the model by calculating RMSE

○ Root Mean Squared Error
● On average, the model is 21.35 units wrong
● It makes sense, as standard deviation of the dataset is 20
In [9]:
from sklearn.metrics import mean_squared_error

rmse = lambda y, y_pred: np.sqrt(mean_squared_error(y, y_pred))

rmse(y_test, preds)

Visualize the Best-Fit line

● If you re-train the model of the entire dataset and then make predictions for the
entire dataset, you'll get the best fit line
● You can then visualize this line with Matplotlib:
In [14]:
model_all = SimpleLinearRegression()
model_all.fit(X, y)
preds_all = model_all.predict(X)

plt.scatter(X, y, s=200, c='#087E8B', alpha=0.65, label='Source data')

plt.plot(X, preds_all, color='#000000', lw=3, label=f'Best fit line > B0 = {model_all.b0:.2f}, B1 =
{model_all.b1:.2f}')
plt.title('Best fit line', size=20)
plt.xlabel('X', size=14)
plt.ylabel('Y', size=14)
plt.legend()
plt.show()

Comparison with Scikit-Learn

● We want to know if our model is good, so let's compare it with LinearRegression

model from Scikit-Learn
● The input data must be reshaped beforehand:
In [11]:
from sklearn.linear_model import LinearRegression

sk_model = LinearRegression()
sk_model.fit(np.array(X_train).reshape(-1, 1), y_train)
sk_preds = sk_model.predict(np.array(X_test).reshape(-1, 1))
sk_model.intercept_, sk_model.coef_

● Our coefficients were (-1.357484948041531, 1.0026529556316826)

● Not identical, but within a margin of error
● Let's check the RMSE:
In [12]:
rmse(y_test, sk_preds)

21.351850699502783
● Ours was 21.351850699502787, so nearly identical.

Using Scouting Reports Text To Predict NCAA NBA Performance
No ratings yet
Using Scouting Reports Text To Predict NCAA NBA Performance
16 pages
Multiple Regression
No ratings yet
Multiple Regression
7 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
5 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
P05 The Regression Pipeline - Training and Testing Ans
No ratings yet
P05 The Regression Pipeline - Training and Testing Ans
13 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
C2W3_Lab_01_Model_Evaluation_and_Selection
No ratings yet
C2W3_Lab_01_Model_Evaluation_and_Selection
21 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
Exp 1
No ratings yet
Exp 1
11 pages
Exp 1
No ratings yet
Exp 1
6 pages
MLR Example 2predictors
No ratings yet
MLR Example 2predictors
5 pages
6_Classification and Regression Tasks
No ratings yet
6_Classification and Regression Tasks
115 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Lab2 Linear Regression
100% (1)
Lab2 Linear Regression
18 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
PA 6
No ratings yet
PA 6
7 pages
Initialization
No ratings yet
Initialization
16 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
C2_W3_Assignment
No ratings yet
C2_W3_Assignment
437 pages
EXP-4 DMusingPYTHON
No ratings yet
EXP-4 DMusingPYTHON
7 pages
Lab 11,12 - Copy
No ratings yet
Lab 11,12 - Copy
7 pages
ml_6_7_8 (1)
No ratings yet
ml_6_7_8 (1)
10 pages
lab-5-nguyenngocmaithi-20130120
No ratings yet
lab-5-nguyenngocmaithi-20130120
20 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
MLP Unit-2
No ratings yet
MLP Unit-2
102 pages
Regression Linaire Python Tome II
No ratings yet
Regression Linaire Python Tome II
10 pages
Question 1 (Linear Regression)
No ratings yet
Question 1 (Linear Regression)
18 pages
Lab-5 Report
No ratings yet
Lab-5 Report
11 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Simple Linear Regression Code
No ratings yet
Simple Linear Regression Code
3 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
05 Logistic - Regression
No ratings yet
05 Logistic - Regression
7 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
6_Classification and Regression Tasks (3)
No ratings yet
6_Classification and Regression Tasks (3)
100 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
machine learning final manual
No ratings yet
machine learning final manual
45 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
21CSC305P Ml - Lab Programs 1 -9
No ratings yet
21CSC305P Ml - Lab Programs 1 -9
36 pages
C3 W1 Anomaly Detection
No ratings yet
C3 W1 Anomaly Detection
14 pages
Lab#10 Ai
No ratings yet
Lab#10 Ai
3 pages
Intro to Linear and Logistic Reg
No ratings yet
Intro to Linear and Logistic Reg
5 pages
Linear regression
No ratings yet
Linear regression
1 page
CS6301 Homework2 KR
No ratings yet
CS6301 Homework2 KR
13 pages
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
No ratings yet
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
10 pages
Intro To Forecasting
No ratings yet
Intro To Forecasting
15 pages
C1 W2 Lab05 Sklearn GD Soln
No ratings yet
C1 W2 Lab05 Sklearn GD Soln
3 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
PythonForML2023 Laboratory07 08 Regression Classification Update2
No ratings yet
PythonForML2023 Laboratory07 08 Regression Classification Update2
6 pages
Sample Exam For ML YSZ: Question 1 (Linear Regression)
No ratings yet
Sample Exam For ML YSZ: Question 1 (Linear Regression)
4 pages
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
No ratings yet
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
4 pages
Lab 1
No ratings yet
Lab 1
8 pages
AD3411-DATA SCIENCE AND ANALYTICS LABORATORY
No ratings yet
AD3411-DATA SCIENCE AND ANALYTICS LABORATORY
27 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
SVM Implementation
No ratings yet
SVM Implementation
8 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
Experiment No 2 ML
No ratings yet
Experiment No 2 ML
3 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Better Data Science - Make Synthetic Datasets With Python
No ratings yet
Better Data Science - Make Synthetic Datasets With Python
4 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Better Data Science - Generate PDF Reports With Python
No ratings yet
Better Data Science - Generate PDF Reports With Python
5 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
13 pages
NNDL
No ratings yet
NNDL
96 pages
Google - Machine Learning Glossary
No ratings yet
Google - Machine Learning Glossary
83 pages
Estimating Growth and Nutritional Performance - Cassava
No ratings yet
Estimating Growth and Nutritional Performance - Cassava
20 pages
Face Recognition Using CNN
No ratings yet
Face Recognition Using CNN
17 pages
An Explainable Transformer-Based Model For Phishing Email Detection: A Large Language Model Approach
No ratings yet
An Explainable Transformer-Based Model For Phishing Email Detection: A Large Language Model Approach
15 pages
Unit-V (1)
No ratings yet
Unit-V (1)
165 pages
Module 3
No ratings yet
Module 3
35 pages
A Review of Bayesian Machine Learning Principles, Methods, and Applications
No ratings yet
A Review of Bayesian Machine Learning Principles, Methods, and Applications
6 pages
Module 2
No ratings yet
Module 2
53 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
unit-iv-v-deep-learning-material
No ratings yet
unit-iv-v-deep-learning-material
32 pages
Linear Regression Basic Interview Questions
No ratings yet
Linear Regression Basic Interview Questions
36 pages
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
No ratings yet
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
6 pages
Shrinkage Method
No ratings yet
Shrinkage Method
2 pages
Garment Returns Prediction For AI-Based Processing and Waste Reduction in E-Commerce
No ratings yet
Garment Returns Prediction For AI-Based Processing and Waste Reduction in E-Commerce
9 pages
Instant Access to Deep Learning A Visual Approach Glassner ebook Full Chapters
100% (2)
Instant Access to Deep Learning A Visual Approach Glassner ebook Full Chapters
65 pages
module 2
No ratings yet
module 2
42 pages
2023-Effort Estimation in Agile Software Development Using Autoencoders
No ratings yet
2023-Effort Estimation in Agile Software Development Using Autoencoders
7 pages
Codebasics DS AI Bootcamp Brochure v1
No ratings yet
Codebasics DS AI Bootcamp Brochure v1
41 pages
unit-3
No ratings yet
unit-3
30 pages
MLT Unit 5 12m
No ratings yet
MLT Unit 5 12m
25 pages
ET - Project Presentation Solution
No ratings yet
ET - Project Presentation Solution
29 pages
Data Science in the BFSI Domain: Transforming Financial Services
No ratings yet
Data Science in the BFSI Domain: Transforming Financial Services
19 pages
Feature Engineering (Examples)
No ratings yet
Feature Engineering (Examples)
8 pages
Boosting Algorithms: Regularization, Prediction and Model Fitting
No ratings yet
Boosting Algorithms: Regularization, Prediction and Model Fitting
29 pages
Backtest OverFitting
No ratings yet
Backtest OverFitting
58 pages
kumar-2024-ijca-924115
No ratings yet
kumar-2024-ijca-924115
7 pages
EHB 420E - Artificial Neural Networks Term Project: Machine Learning Models For Heart Attack Prediction
No ratings yet
EHB 420E - Artificial Neural Networks Term Project: Machine Learning Models For Heart Attack Prediction
10 pages
From Human Days To Machine Seconds Automatically Answering and Generating Machine Learning Final Exams
No ratings yet
From Human Days To Machine Seconds Automatically Answering and Generating Machine Learning Final Exams
9 pages