0% found this document useful (0 votes)

7 views

Linear Regression Algorithm

Data Science

Uploaded by

Buvanesh Nallaperumal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Linear Regression Algorithm

Data Science

Uploaded by

Buvanesh Nallaperumal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Linear Regression

Linear Regression Algorithms Agenda

1. What is Machine Learning ?

2. Linear Regression & Algorithm

3. Algebra & Geometry

4. Linear Relationship with an Example

5. Evaluating the performance of Linear Models

6. Evaluation Metrics

7. Regularization Techniques

8. Bias Variance Trade-off

Machine Learning Machine Learning process flow
 Subset of AI

 Enables system to learn from Data

Machine Learning

 Focuses on Dev. of Algos to identify

patterns, make predictions, extract
insights
ML in Data Science Types of Machine Learning
 Predictive Analytics

 Pattern Recognition

 Classification & Regression

 Optimization

 Automation
Linear regression

Statistical method used to model the relationship between a dependent variable and
one or more independent variables.

The goal is to find the linear equation that best predicts the dependent variable from
Linear Regression

the independent variables.

1. Simple Linear Regression:

Involves one independent variable and one dependent variable. The relationship is
modeled using a straight line.

2. Multiple Linear Regression:

Involves two or more independent variables. The relationship is still linear, but it
extends into multiple dimensions.

Algorithm :

 A well-defined sequence of steps or instructions designed to perform a specific task or

solve a particular problem.

 Fundamental to CS & mathematics and are used in various fields to process data,
automate tasks, and solve complex problems efficiently.
Characteristics equation of Geometries

Straight Line - Y = mX + C

Circle - X 2 + Y2 = R 2
Algebra & Geometry

- (X – h)2 + (Y – k)2 = R2

Parabola - Y2 = 4pX

- (Y-k)2 = 4p(X-h)

Ellipse - (X/a)2 + (Y/b)2 = 1

Simple Linear Regression

Given n data points (x1,y1), (x2,y2),..., (xn,yn). The goal is to find a line y=mx+c that best fits the data.

 Calculate the Means: Compute the mean of x (x̄) and the mean of y (ȳ).
Linear Relationship

 Calculate the Slope (m):

 Calculate the Intercept (c) :

 Form the Regression Equation y = mx + c

Multiple Linear Regression

For multiple predictors, the model is y = b0 + b1 x1 +b 2x2 + ... + bkxk.
1.Formulate the Design Matrix (X): This matrix includes all independent variables.
2.Compute the Coefficients (b) using the normal equation:

where X is the design matrix and y is the vector of observed values.

Example of Linear Relationship
Suppose you are analyzing the relationship between the number of hours a student studies
(independent variable x) and their score on a test (dependent variable Y).

Calculation :
Linear Relationship

1. Calculate the Means

2. Calculate the Slope (m):

m = 7.5

3. Calculate the Intercept (C):

C = 41.5

4. Final Equation of the line (Characteristics)

Y = 7.5 x + 41.5
Interpretation

• Slope (m = 7.5): For every additional hour studied, the test score increases by 7.5 points.

• Intercept (c = 41.5): If a student studies for 0 hours, the predicted test score is 41.5 points.
Linear Relationship

Using the Equation for Predictions

Let's predict the test score for a student who studies for 6 hours (x = 6) :

Y = 7.5 × 6 + 41.5 = 45 + 41.5

Y = 86.5

So, a student who studies for 6 hours is predicted to score 86.5 points on the test.
1. Train Test Split
Description: Split into two : one for training the model & other for testing its perf.
Purpose: Helps evaluate how well the model can predict unseen data.
Evaluating Linear Models

2. Cross-Validation
Description: Involves partitioning the dataset into several subsets (folds) &
training/testing the model multiple times, each time using a different fold as the test
set.
Purpose: Provides a more reliable estimate of model performance and reduces the
risk of overfitting.
3. Residual Analysis
Description: Analyze the residuals (the differences between predicted and actual
values) to check for patterns.
Purpose: Residuals should be randomly distributed with no discernible pattern,
indicating that the model is a good fit.
•Residual Plots: Plot residuals against predicted values to check for homoscedasticity
(constant variance).
4. Model Assumptions Check
Linear regression relies on several assumptions that should be checked:
Evaluating Linear Models

•Linearity: The relationship between predictors and the outcome should be linear.

•Independence: Observations should be independent of each other.

•Homoscedasticity: The variance of residuals should be constant across all levels of

the independent variables.

•Normality: Residuals should be approximately normally distributed.

•No Multicollinearity: Indepen. variables are not highly correlated with each other.

5. Feature Importance and Coefficients

Description: Examine the coefficients of the model to understand the importance of
each feature.

Purpose: Identifies which variables have the most influence on the predicted
outcome.
1. MAE (Mean Absolute Error)

Measures average abs. diff. b/n predicted & actual values.

2. MSE (Mean Squared Error)

Evaluation Metrics

Measures average squared diff b/n predicted & actual values, penalizing larger errors
more than MAE.

3. RMSE (Root Mean Squared Error)

Provides error in the same units as the target variable, making it more interpretable.

4. R2 (R Squared)

Represents the proportion of variance in the dependent variable that can be explained
by the independent variables. Ranges from 0 to 1.
Regularization Types
 Ridge and Lasso Regression are two regularization techniques used to address the
problem of multicollinearity and overfitting in linear regression models.
Regularization Techniques

 They add a penalty term to the ordinary least squares (OLS) objective function,
encouraging simpler models that generalize better.
Ridge Regression (L2 Regularization)
 Adds a penalty equivalent to the square of the magnitude of coefficients to the
loss function.
 It shrinks the coefficients, but never exactly to zero, which means it includes all
predictors in the model.
Characteristics :
Colsed-form solution  Shrinks coefficients

Ridge Objective  Used upon Multicollinearity

 Optimization : Closed-form
solution
 Bias-variance trade-off

(Introducing bias & red. Var.)

Lasso Regression (L1 Regularization)
 Adds a penalty equivalent to the absolute value of the magnitude of coefficients
to the loss function.
Regularization Techniques

 It can shrink some coefficients to exactly zero, effectively performing feature

selection.

Characteristics :
 Shrinks coefficients : Performs feature selection and shrinks some coefficients to ‘0’
 Used when : We believe some predictors are irrelevant and want to simplify the model
 Bias-variance trade-off : Introducing bias by significantly reducing Variance)
 Optimization : Requires iterative algorithms
Lasso Regression (L1 Regularization)
 Adds a penalty equivalent to the absolute value of the magnitude of coefficients
to the loss function.
Regularization Techniques

 It can shrink some coefficients to exactly zero, effectively performing feature

selection.

Characteristics :
 Shrinks coefficients : Performs feature selection and shrinks some coefficients to ‘0’
 Used when : We believe some predictors are irrelevant and want to simplify the model
 Bias-variance trade-off : Introducing bias by significantly reducing Variance)
 Optimization : Requires iterative algorithms
Bias Variance Trade-off
 Trade off between two type of error viz., Bias Error & Variance Error which affects the
performance of the model
Bias Variance Trade-off

Trade-off

The trade-off between bias and variance can be summarized as follows:

 Low Bias, High Variance: Complex models (e.g., deep neural networks) that fit the training
data very well but perform poorly on unseen data due to capturing noise. Overfitting

 High Bias, Low Variance: Simple models (e.g., linear regression) that may not fit the
training data closely but generalize better to unseen data. Underfitting

 Low Bias, Low Variance: A Good model / A good Errors Trade-off

Bias refers to the error introduced by approximating a real-world problem, which may be
complex, by a simplified model. High bias can lead to:
• Underfitting: The model is too simple to capture the underlying patterns in the data,
resulting in poor performance on both training and test sets.
Variance
Bias Variance Trade-off

Variance refers to the model's sensitivity to fluctuations in the training data. High variance
can lead to:
• Overfitting: The model captures noise and random fluctuations in the training data rather
than the true underlying pattern, resulting in excellent performance on the training set but
poor generalization to new data.

Linear Regression
No ratings yet
Linear Regression
16 pages
GLOBAL DEMOGRAPHY and GLOBAL MIGRATION
100% (7)
GLOBAL DEMOGRAPHY and GLOBAL MIGRATION
24 pages
Modern Pridictive Modelling(Regression)
No ratings yet
Modern Pridictive Modelling(Regression)
12 pages
Regression_Questionnaire
No ratings yet
Regression_Questionnaire
10 pages
He Images Outline the Steps to Solve a Supervised Learning Problem
No ratings yet
He Images Outline the Steps to Solve a Supervised Learning Problem
24 pages
Chatgpt Unit - 2
No ratings yet
Chatgpt Unit - 2
3 pages
ML 2 nd Unit
No ratings yet
ML 2 nd Unit
50 pages
Machine Learning QB
No ratings yet
Machine Learning QB
32 pages
Dmbi - Exp7
No ratings yet
Dmbi - Exp7
5 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
Data Science Module 5 q & A
No ratings yet
Data Science Module 5 q & A
8 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
EXPERIMENT2
No ratings yet
EXPERIMENT2
2 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
UCS-401_CSE7th M L_lect_10_Unit-Ll_Least Squares Method, Multivariate Linear Regression, Regul
No ratings yet
UCS-401_CSE7th M L_lect_10_Unit-Ll_Least Squares Method, Multivariate Linear Regression, Regul
16 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Introduction To AI and ML
No ratings yet
Introduction To AI and ML
22 pages
DS Unit 2 Essay Answers
No ratings yet
DS Unit 2 Essay Answers
17 pages
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-34-62
No ratings yet
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-34-62
29 pages
Article Module 4
No ratings yet
Article Module 4
8 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
Supervised Learning Regression
No ratings yet
Supervised Learning Regression
15 pages
Data Science for Civil Engineering Unit 2 Notes
No ratings yet
Data Science for Civil Engineering Unit 2 Notes
22 pages
9 Types of Regression Analysis
No ratings yet
9 Types of Regression Analysis
16 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
ML 3 (1)
No ratings yet
ML 3 (1)
50 pages
ML Module 2,3,4
No ratings yet
ML Module 2,3,4
13 pages
AI Algoritm Course
No ratings yet
AI Algoritm Course
19 pages
TOD 212 - PPT 1 For Students - Monsoon 2023
No ratings yet
TOD 212 - PPT 1 For Students - Monsoon 2023
26 pages
5 - Part II - Regression Analysis w-notes(1)
No ratings yet
5 - Part II - Regression Analysis w-notes(1)
10 pages
Lab 1
No ratings yet
Lab 1
6 pages
B-56 Sanket Jambhulkar MLA-2
No ratings yet
B-56 Sanket Jambhulkar MLA-2
8 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
ML-1
No ratings yet
ML-1
24 pages
Aiml - 04 - 28
No ratings yet
Aiml - 04 - 28
4 pages
DIMPAS_BSCPE_2-7_ASSIGNMENT_NO.9
No ratings yet
DIMPAS_BSCPE_2-7_ASSIGNMENT_NO.9
17 pages
6.Classification & Regression
No ratings yet
6.Classification & Regression
45 pages
Lecture 7 Vector Analysis-converted
No ratings yet
Lecture 7 Vector Analysis-converted
44 pages
U-4_IML
No ratings yet
U-4_IML
17 pages
Mathematical Modeling
No ratings yet
Mathematical Modeling
33 pages
MOD3_EDA
No ratings yet
MOD3_EDA
16 pages
ML final
No ratings yet
ML final
92 pages
module 2 modified
No ratings yet
module 2 modified
67 pages
Nonlinear Regression
No ratings yet
Nonlinear Regression
43 pages
Linear Regression - FDS
No ratings yet
Linear Regression - FDS
18 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Unit 2 Supervised Learning and Applications
No ratings yet
Unit 2 Supervised Learning and Applications
13 pages
Optimisation and Optimal Control
No ratings yet
Optimisation and Optimal Control
82 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
45 pages
PRACTICAL5
No ratings yet
PRACTICAL5
4 pages
Regression
No ratings yet
Regression
39 pages
Karthik Nambiar 60009220193
No ratings yet
Karthik Nambiar 60009220193
9 pages
Unit 3
No ratings yet
Unit 3
25 pages
Machine Learning: by Team 2
No ratings yet
Machine Learning: by Team 2
41 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Unit II - Diagnotis and Multiple Linear
No ratings yet
Unit II - Diagnotis and Multiple Linear
8 pages
MLDA U1
No ratings yet
MLDA U1
10 pages
UNIT 3
No ratings yet
UNIT 3
13 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Data Analytics Syllabus
No ratings yet
Data Analytics Syllabus
2 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
Statistics N Probability
No ratings yet
Statistics N Probability
31 pages
Logistic Regression Algorithm
No ratings yet
Logistic Regression Algorithm
8 pages
Pipelines, Functions, Oops
No ratings yet
Pipelines, Functions, Oops
14 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Far Theories
No ratings yet
Far Theories
4 pages
Deep Dive Into IEV and Views From The Market: Sanket Kawatkar
No ratings yet
Deep Dive Into IEV and Views From The Market: Sanket Kawatkar
62 pages
unit-5-ad3491-fundamentals-of-data-science-unit-5-notes (1)
No ratings yet
unit-5-ad3491-fundamentals-of-data-science-unit-5-notes (1)
24 pages
Key Formulas: Simple Linear Regression
No ratings yet
Key Formulas: Simple Linear Regression
2 pages
Tabel Bunga
No ratings yet
Tabel Bunga
2 pages
Chap 14 - Simple Linear Regression
No ratings yet
Chap 14 - Simple Linear Regression
3 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
10 pages
Regression Analysis and Modelling - Amar Sahay
No ratings yet
Regression Analysis and Modelling - Amar Sahay
93 pages
Population Math Practice
No ratings yet
Population Math Practice
2 pages
Lease & Pension
100% (2)
Lease & Pension
11 pages
Econometrics-Ii Quiz
No ratings yet
Econometrics-Ii Quiz
1 page
Mece-003 Eng
No ratings yet
Mece-003 Eng
44 pages
Insurance Linked Securities, Reinsurance, Risk Management REFERENCES
0% (2)
Insurance Linked Securities, Reinsurance, Risk Management REFERENCES
16 pages
Hawig Scores
No ratings yet
Hawig Scores
10 pages
Exam 2 2002
No ratings yet
Exam 2 2002
7 pages
ECN224 Exe 2
No ratings yet
ECN224 Exe 2
2 pages
Accounting For Pensions and Postretirement Benefits: Assignment Classification Table (By Topic)
No ratings yet
Accounting For Pensions and Postretirement Benefits: Assignment Classification Table (By Topic)
71 pages
1 SM
No ratings yet
1 SM
15 pages
Competitive Study of LIC Vs Private Players in Life Insurance Sector
89% (19)
Competitive Study of LIC Vs Private Players in Life Insurance Sector
89 pages
EN - RO Glossary
No ratings yet
EN - RO Glossary
4,905 pages
Black-Scholes Abandon Project
No ratings yet
Black-Scholes Abandon Project
4 pages
Srs Bulletin: Sample Registration System
No ratings yet
Srs Bulletin: Sample Registration System
9 pages
ACET Topper Interview PDF
No ratings yet
ACET Topper Interview PDF
1 page
Wapic 2017 Annual Report PDF
100% (1)
Wapic 2017 Annual Report PDF
244 pages
Determinants of Demographic Trends
No ratings yet
Determinants of Demographic Trends
2 pages
q6-5 Solution (Ridge and Lasso)
No ratings yet
q6-5 Solution (Ridge and Lasso)
7 pages
Chapter 8 Ppt New Period 3
No ratings yet
Chapter 8 Ppt New Period 3
12 pages
Introduction to the Practice of Statistics 9th Edition Moore Test Bank all chapter instant download
100% (2)
Introduction to the Practice of Statistics 9th Edition Moore Test Bank all chapter instant download
52 pages