Linear Regression-Part 2

Linear regression models the relationship between one or more independent variables (x) and a dependent variable (y). The linear regression equation is f(x) = b0 + b1x, where b0 is the y-intercept and b1 is the slope of the line. The goal of linear regression is to choose values for b0 and b1 that minimize the sum of squared residuals between the actual y-values and the predicted y-values from the linear model. The residuals are the vertical distances between the data points and the linear regression line, representing the error in the predictions. Linear regression aims to minimize these errors by fitting the "line of best fit" through the data points.

Uploaded by

fathiah

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Linear Regression-Part 2

Uploaded by

fathiah

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Linear regression

Part 2
Linear Regression
Continue
• When implementing simple linear regression, you typically start with a given set of input-output (𝑥-𝑦) pairs.
• These pairs are your observations, shown as green circles in the figure.
• For example, the leftmost observation has the input 𝑥 = 5 and the actual output, or response, 𝑦 = 5. The next one has 𝑥 = 15 and 𝑦 = 20, and so
on.
• The estimated regression function, represented by the black line, has the equation
𝑓(𝑥) = 𝑏₀ + 𝑏₁𝑥.
• Your goal is to calculate the optimal values of the predicted weights 𝑏₀ and 𝑏₁ that minimize SSR and determine the estimated regression
function.
• The value of 𝑏₀, also called the intercept, shows the point where the estimated regression line crosses the 𝑦 axis.
• It’s the value of the estimated response 𝑓(𝑥) for 𝑥 = 0.
• The value of 𝑏₁ determines the slope of the estimated regression line.
• The predicted responses, shown as red squares, are the points on the regression line that correspond to the input
values.
• For example, for the input 𝑥 = 5, the predicted response is 𝑓(5) = 8.33, which the leftmost red square represents.
• The vertical dashed grey lines represent the residuals, which can be calculated as
𝑦ᵢ - 𝑓(𝐱ᵢ) = 𝑦ᵢ - 𝑏₀ - 𝑏₁𝑥ᵢ for 𝑖 = 1, …, 𝑛.

• They’re the distances between the green circles and red squares.
• When you implement linear regression, you’re actually trying to minimize these distances and make the red squares as close to the
predefined green circles as possible.
linear regression equation
Example
Line of Best Fit

• The Linear Regression model have to find the line of best fit.
• We know the equation of a line is y=mx+c.
• There are infinite m and c possibilities, which one to chose?
• Out of all possible lines, how to find the best fit line?
• The line of best fit is calculated by using the cost function — Least
Sum of Squares of Errors.
• The line of best fit will have the least sum of squares error.
Cost Function

• For all possible lines, calculate the sum of squares of errors. The line
which has the least sum of squares of errors is the best fit line.
The Least Squares Regression Line
• Definition
• Given a collection of pairs (x,y) of numbers (in which not all the x-values are the same), there
is a line that best fits the data in the sense of minimizing the sum of the
squared errors.
• It is called the least squares regression line.
• Its slope and y-intercept are computed using the formulas:

Where
Example
• Find the least squares regression line for the five-point data set:
X 2 2 6 8 10
Y 0 1 2 3 3

1. Show the data in the tabular form.

Continue

The least squares regression line for these data is

Continue
• The computations for measuring how well it fits the sample data are
given below:
The sum of the squared errors (SSE) is the
sum of the numbers in the last column,
which is 0.75

**Residual Sum of Squares (RSS) also known

as SSE finds the difference between the
observed, or actual value of the variable, and
the estimated value, which is what it should
be according to the line of regression

The value estimated by the regression line

Performance metric for Regression Line
• The important element in any machine learning model is to evaluate
the accuracy of the model.
• Metrics below can be used to evaluate the accuracy of the model
• The Mean Squared Error
• Mean absolute error
• Root Mean Squared Error
• and R-Squared or Coefficient of determination metrics
Mean Absolute error (MAE)
• Mean absolute error (MAE) is a loss
function used for regression.
• Use MAE when you are doing regression
and don't want outliers to play a big role.
• The Mean absolute error represents the average of • The loss is the mean over the absolute
the absolute difference between the actual and differences between true and predicted
predicted values in the dataset. values, deviations in either direction from
the true value are treated the same way.
• It measures the average of the residuals in the dataset.
Mean Squared Error (MSE)
• Mean Squared Error represents the average of the squared difference
between the original and predicted values in the data set.
• It measures the variance of the residuals.

variance measures how far each

number in the set is from the
mean (average), and thus from
every other number in the set.
Root Mean Squared Error (RMSE)
• Root Mean Squared Error is the square root of Mean Squared error.
• It measures the standard deviation of residuals.

• A standard deviation (or σ) is a measure of how dispersed the data is in relation to the mean.
• Low standard deviation means data are clustered around the mean,
• High standard deviation indicates data are more spread out.
The coefficient of determination or R-squared
• The coefficient of determination or R-squared represents the
proportion of the variance in the dependent variable which is
explained by the linear regression model.
• It is a scale-free score i.e. irrespective of the values being small or
large, the value of R square will be less than one.
Continue
• The lower value of MAE, MSE, and RMSE implies higher accuracy of a
regression model.
• However, a higher value of R square is considered desirable.
• Both RMSE and R- Squared quantifies how well a linear regression
model fits a dataset.
• The RMSE tells how well a regression model can predict the value of a
response variable in absolute terms .
• R- Squared tells how well the predictor variables can explain the
variation in the response variable.
Polynomial Regression

• You can regard polynomial regression as a generalized case of linear regression. You assume the polynomial
dependence between the output and inputs and, consequently, the polynomial estimated regression function.
• In other words, in addition to linear terms like 𝑏₁ 𝑥₁, your regression function 𝑓 can include nonlinear terms such as
𝑏₂𝑥₁², 𝑏₃𝑥₁³, or even 𝑏₄𝑥₁𝑥₂, 𝑏₅𝑥₁²𝑥₂.
• The simplest example of polynomial regression has a single independent variable, and the estimated regression
function is a polynomial of degree two: 𝑓(𝑥) = 𝑏₀ + 𝑏₁ 𝑥 + 𝑏₂ 𝑥².
• Now, remember that you want to calculate 𝑏₀, 𝑏₁, and 𝑏₂ to minimize SSR. These are your unknowns!
• Keeping this in mind, compare the previous regression function with the function 𝑓( 𝑥₁, 𝑥₂) = 𝑏₀ + 𝑏₁ 𝑥₁ + 𝑏₂ 𝑥₂, used for
linear regression. They look very similar and are both linear functions of the unknowns 𝑏₀, 𝑏₁, and 𝑏₂. This is why
you can solve the polynomial regression problem as a linear problem with the term 𝑥² regarded as an input
variable.
• In the case of two variables and the polynomial of degree two, the regression function has this form: 𝑓( 𝑥₁, 𝑥₂) = 𝑏₀
+ 𝑏₁𝑥₁ + 𝑏₂𝑥₂ + 𝑏₃𝑥₁² + 𝑏₄𝑥₁𝑥₂ + 𝑏₅𝑥₂².
• The procedure for solving the problem is identical to the previous case. You apply linear regression for five inputs:
𝑥₁, 𝑥₂, 𝑥₁², 𝑥₁𝑥₂, and 𝑥₂². As the result of regression, you get the values of six weights that minimize SSR: 𝑏₀, 𝑏₁, 𝑏₂,
𝑏₃, 𝑏₄, and 𝑏₅.
Polynomial Regression
Polynomial regression is needed when there is no linear
correlation fitting all the variables. So instead of looking like a
line, it looks like a nonlinear function.
Linear vs Polynomial
example of weight loss.

• In the case of multiple linear

regression, you are interested in
how multiple different values
impact weight loss – like hours
spent at the gym, sugar intake,
and so on.

• In the case of polynomial

regression, you are interested in
how multiple different powers
of one variable impact it.
(xx, x^2x2, x^3x3, and so on,
where xx is the sugar intake, for
example.)
Why Polynomial?
• Polynomial regression is useful in many cases.
• Since a relationship between the independent and dependent variables isn’t
required to be linear, you get more freedom in the choice of datasets and
situations you can be working with.
• So this method can be applied when simple linear regression underfits the data.
• Polynomial regression is a simple yet powerful tool for predictive analytics. It
allows user to consider non-linear relations between variables and reach
conclusions that can be estimated with high accuracy.
• This type of regression can help user to predict disease spread rate, calculate
fair compensation, or implement a preventative road safety regulation
software.
Underfitting and Overfitting

• Underfitting
• occurs when a model can’t accurately capture the dependencies among data, usually as a
consequence of its own simplicity.
• It often yields a low 𝑅² with known data and bad generalization capabilities when applied
with new data.
• Overfitting
• A model learns the existing data too well.
• Complex models, which have many features or terms, are often prone to overfitting.
• When applied to known data, such models usually yield high 𝑅².
• However, they often don’t generalize well and have significantly lower 𝑅² when used with
new data.
• The left plot shows a linear

Continue regression line that has a low 𝑅².

• It might also be important that a
straight line can’t take into
account the fact that the actual
response increases as 𝑥 moves
away from twenty-five and
toward zero.
• This is likely an example of
underfitting.

• The right plot illustrates polynomial

regression with the degree equal to two.
• In this instance, this might be the optimal
degree for modeling this data.
• The model has a value of 𝑅² that’s
satisfactory in many cases and shows
trends nicely.
• The left plot presents polynomial
regression with the degree equal to
Continue three.
• The value of 𝑅² is higher than in the
preceding cases.
• This model behaves better with known
data than the previous ones.
• However, it shows some signs of
overfitting, especially for the input
values close to sixty, where the line
starts decreasing, although the actual
data doesn’t show that.

• The right plot, you can see the

perfect fit: six points and the
polynomial line of the degree
five (or higher) yield 𝑅² = 1.
• Each actual response equals
its corresponding prediction.
Exercise in class
1. The pricing schedule for labor on a service call by an elevator repair
company is $150 plus $50 per hour on site.
• Write down the linear equation that relates the labor cost y to the number of
hours x that the repairman is on site.
• Calculate the labor cost for a service call that lasts 2.5 hours.
2. The cost of a telephone call made through a leased line service is
2.5 cents per minute.
• Write down the linear equation that relates the cost y (in cents) of a call to its
length x.
• Calculate the cost of a call that lasts 23 minutes.
exercise

MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Dijkstra Algorithm
No ratings yet
Dijkstra Algorithm
2 pages
Analytics Compendium
No ratings yet
Analytics Compendium
41 pages
How To Define Document Types For RFQ in SAP
No ratings yet
How To Define Document Types For RFQ in SAP
6 pages
Unit 2 - Week 1
No ratings yet
Unit 2 - Week 1
5 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Regression v33
No ratings yet
Regression v33
81 pages
Linear regression case study
No ratings yet
Linear regression case study
6 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
3 Da
No ratings yet
3 Da
16 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
Data Analytics Unit 3 Notes
100% (2)
Data Analytics Unit 3 Notes
28 pages
U3 U4 Regression
No ratings yet
U3 U4 Regression
22 pages
Linearregressionpl
No ratings yet
Linearregressionpl
9 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
5 - Part II - Regression Analysis w-notes(1)
No ratings yet
5 - Part II - Regression Analysis w-notes(1)
10 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Complete Linear Regression Algorithm
No ratings yet
Complete Linear Regression Algorithm
4 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
ML L6 Linear Regresion
No ratings yet
ML L6 Linear Regresion
54 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
LinearRegression
No ratings yet
LinearRegression
24 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
FDA UNIT 5
No ratings yet
FDA UNIT 5
20 pages
linear regression
No ratings yet
linear regression
20 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Lecture 5 Regression
No ratings yet
Lecture 5 Regression
77 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
7 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Linear Regression by Sam
No ratings yet
Linear Regression by Sam
27 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
UNIT-III Lecture Notes
No ratings yet
UNIT-III Lecture Notes
18 pages
Linear Regression Concepts_A4
No ratings yet
Linear Regression Concepts_A4
6 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Module 5 - Data Science Methodology
No ratings yet
Module 5 - Data Science Methodology
17 pages
Polynomial Regression-Update
No ratings yet
Polynomial Regression-Update
2 pages
Class Assignment 2
No ratings yet
Class Assignment 2
3 pages
Security and Threats
No ratings yet
Security and Threats
44 pages
Big Data Analytics in Intelligent Transportation Systems A Survey
No ratings yet
Big Data Analytics in Intelligent Transportation Systems A Survey
20 pages
Statcon Electronics & Powtech Internship Projects
No ratings yet
Statcon Electronics & Powtech Internship Projects
2 pages
2101267044class VII Summer Vacation Holiday HW 04.05.2019 1st Shift
No ratings yet
2101267044class VII Summer Vacation Holiday HW 04.05.2019 1st Shift
5 pages
HTB-2000 Datasheet
No ratings yet
HTB-2000 Datasheet
1 page
Group 16 - Raspberry Pi - PPT
No ratings yet
Group 16 - Raspberry Pi - PPT
8 pages
Cognitive Radio Architecture
No ratings yet
Cognitive Radio Architecture
12 pages
Knot My Reality 1st Edition Miranda May download pdf
100% (2)
Knot My Reality 1st Edition Miranda May download pdf
47 pages
Understanding Timer Control Vb6
100% (1)
Understanding Timer Control Vb6
34 pages
TOEFL VOCAB - Section 1 - General Vocabulary
No ratings yet
TOEFL VOCAB - Section 1 - General Vocabulary
19 pages
Foundations of Computer Science 4th Edition by Forouzan
No ratings yet
Foundations of Computer Science 4th Edition by Forouzan
7 pages
Responsibility Assignment Matrix Project Details: Project No: Date: Full Project Name: Project Manager
No ratings yet
Responsibility Assignment Matrix Project Details: Project No: Date: Full Project Name: Project Manager
6 pages
The Fragility of Technology
No ratings yet
The Fragility of Technology
7 pages
Updates
No ratings yet
Updates
104 pages
Opt Nov. 2023 Ella
No ratings yet
Opt Nov. 2023 Ella
539 pages
Algorithm Design & Analysis Presentation - E-60th
No ratings yet
Algorithm Design & Analysis Presentation - E-60th
18 pages
แนวข้อสอบวิชาภาษาอังกฤษติวเข้าม.4 พิบูลวิทยาลัย (รอบโครงการ gifted)
No ratings yet
แนวข้อสอบวิชาภาษาอังกฤษติวเข้าม.4 พิบูลวิทยาลัย (รอบโครงการ gifted)
13 pages
Questions Dow
No ratings yet
Questions Dow
5 pages
User Guide: English Version
No ratings yet
User Guide: English Version
8 pages
GP-Net: A Lightweight Generative Convolutional Neural Network with Grasp Priority
No ratings yet
GP-Net: A Lightweight Generative Convolutional Neural Network with Grasp Priority
20 pages
Mis PDF Lecture Notes 1 20
No ratings yet
Mis PDF Lecture Notes 1 20
187 pages
Haven Links
No ratings yet
Haven Links
2 pages
Dr.A.D.Prasad
No ratings yet
Dr.A.D.Prasad
6 pages
Instant Access to GMAT Advanced Quant 250 Practice Problems Online Resources Manhattan Prep GMAT Strategy Guides 3rd Edition Manhattan Prep ebook Full Chapters
100% (3)
Instant Access to GMAT Advanced Quant 250 Practice Problems Online Resources Manhattan Prep GMAT Strategy Guides 3rd Edition Manhattan Prep ebook Full Chapters
18 pages
Coal-Assignment 3
No ratings yet
Coal-Assignment 3
11 pages
Ansible Full Course: For Beginners
No ratings yet
Ansible Full Course: For Beginners
18 pages
CE212 Chapter1
No ratings yet
CE212 Chapter1
57 pages
Major PPT
No ratings yet
Major PPT
18 pages