0% found this document useful (0 votes)

29 views

Regression Analysis in Machine Learning

Regression analysis in ml

Uploaded by

Rhiddha Acharjee

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Regression Analysis in Machine Learning

Regression analysis in ml

Uploaded by

Rhiddha Acharjee

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ABSTRACT

Regression analysis is a crucial tool in machine learning that helps identify relationships between
variables. By applying linear regression techniques, including Ordinary Least Squares , one can
model and predict outcomes based on input data. This report explores key regression concepts and
their practical applications in solving machine learning problems.

INTRODUCTION
In machine learning, regression analysis serves as a fundamental technique for predictive modeling.
It involves identifying the relationship between a dependent variable and one or more independent
variables. Among the most widely used methods is linear regression, which assumes a linear
relationship between variables. The process of fitting a regression line involves calculating the slope
and intercept to minimize the difference between actual and predicted values, often using the
Ordinary Least Squares (OLS) algorithm. Understanding the nature of positive and negative slopes
in regression is essential, as they reflect increasing or decreasing trends in the data. Regression
analysis also extends to multiple variables, allowing for more complex modeling scenarios. In this
report, we will discuss the application of linear and multiple regression techniques to analyze
datasets, compute the regression equation, and estimate correlation coefficients. These techniques
enable us to gain insights into the relationships between variables, thus aiding in decision-making
and predictions, making regression an indispensable tool in machine learning.

PROCEDURE & DISCUSSION

Regression analysis
Regression analysis helps in the prediction of a continuous variable. There are various scenarios in
the real world where we need some future predictions such as weather condition, sales prediction,
marketing trends, etc., for such case we need some technology which can make predictions more
accurately.

Types of Regression
There are various types of regressions which are used in data science and machine learning. Each
type has its own importance on different scenarios, but at the core, all the regression methods
analyze the effect of the independent variable on dependent variables. Here we are discussing some
important types of regression which are given below:

• Linear Regression • Logistic Regression • Polynomial Regression • Support Vector

Regression • Decision Tree Regression • Random Forest Regression • Ridge Regression
• Lasso Regression

Linear Regression in Machine Learning

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the linear
relationship, which means it finds how the value of the dependent variable is changing according to
the value of the independent variable.

The linear regression model provides a sloped straight line representing the relationship between
the variables. Consider the Figure-1:

FIGURE- 1
Mathematically, we can represent a linear regression as:

𝒚 = 𝒂𝟎 + 𝒂𝟏 𝒙 + 𝝐

Where,
y= Dependent Variable (Target Variable)

x= Independent Variable (predictor Variable)

𝒂𝟎 = intercept of the line (Gives an additional degree of freedom)

𝒂𝟏 = Linear regression coefficient (scale factor to each input value).

𝝐= random error

The values for x and y variables are training datasets for Linear Regression model representation.

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

▪ Simple Linear Regression:

If a single independent variable is used to predict the value of a numerical dependent variable, then
such a Linear Regression algorithm is called Simple Linear Regression.

▪ Multiple Linear regression:

If more than one independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Multiple Linear Regression.

Linear Regression Line

A linear line showing the relationship between the dependent and independent variables is called a
regression line. A regression line can show two types of relationship:

▪ Positive Linear Relationship:

If the dependent variable increases on the Y-axis and independent variable increases on X-axis, then
such a relationship is termed as a Positive linear relationship. The positive linear equation is plotted
in Figure-2 (A).

The line equation will be: 𝒚 = 𝒂𝟎 + 𝒂𝟏 𝒙

▪ Negative Linear Relationship:

If the dependent variable decreases on the Y-axis and independent variable increases on the X-axis,
then such a relationship is called a negative linear relationship. The negative linear equation is
plotted in Figure -2 (B).

The line equation will be: 𝒚 = −𝒂𝟏 𝒙 + 𝒂𝟎

F IGURE- 2 (A) FIGURE- 2 (B)

Finding the best fit line:

When working with linear regression, our main goal is to find the best fit line that means the error
between predicted values and actual values should be minimized. The best fit line will have the least
error.

The different values for weights or the coefficient of lines (𝑎0 , 𝑎1) gives a different line of regression,
so we need to calculate the best values for 𝑎0 and 𝑎1 to find the best fit line, so to calculate this we
use cost function.

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average of
squared error occurred between the predicted values and actual values. It can be written as:

For the above linear equation, MSE can be calculated as:

𝑵
𝟏
𝑴𝑺𝑬 = ∑ ( 𝒚𝒊 − (𝒂𝟏 𝒙𝒊 + 𝒂𝟎 ))𝟐
𝑵
{𝒊=𝟏}

Where,

N=Total number of observation; 𝒚𝒊 = Actual value ; (𝒂𝟏 𝒙𝒊 + 𝒂𝟎 )= Predicted value.

:The distance between the actual value and predicted values is called residual. If the observed points
are far from the regression line, then the residual will be high, and so cost function will high. If the
scatter points are close to the regression line, then the residual will be small and hence the cost
function.

Gradient Descent:
Gradient descent is used to minimize the MSE by calculating the gradient of the cost function.

A regression model uses gradient descent to update the coefficients of the line by reducing the cost
function.

It is done by a random selection of values of coefficient and then iteratively update the values to
reach the minimum cost function.

Model Performance:
The Goodness of fit determines how the line of regression fits the set of observations. The process
of finding the best model out of various models is called optimization. It can be achieved by below
method:

R-squared method:

R-squared is a statistical method that determines the goodness of fit.

It measures the strength of the relationship between the dependent and independent variables on a
scale of 0-100%.

The high value of R-square determines the less difference between the predicted values and actual
values and hence represents a good model.

It can be calculated from the below formula:

𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
𝑹 − 𝒔𝒒𝒖𝒂𝒓𝒆𝒅 =
𝒕𝒐𝒕𝒂𝒍 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒎

Assumption of linear regression

▪ Linear relationship: The dependent and independent variables should have a linear
relationship.
▪ No or low multicollinearity: Independent variables shouldn’t be highly correlated, as it
makes it difficult to determine their individual impact on the target.
▪ Normal distribution of errors: Error terms should be normally distributed, ensuring valid
confidence intervals.
▪ No autocorrelation: Errors should not be correlated, as this reduces model accuracy.

Ordinary Least Square method for Linear regression

OLS regression is a statistical method utilized for parameter estimation in linear regression models.
The goal of ordinary least squares (OLS) is to find the optimal line that minimizes the total squared
differences between the actual and estimated values of the dependent variable.

The key components of OLS Linear Regression are:.

• It is utilized to demonstrate the linear relationship between a response variable (y) and one or
more predictor variables (x).

• The linear equation is 𝒚 = 𝜷𝟎 + 𝜷𝟏 × 𝟏 + 𝜷𝟐 × 𝟐 + … + 𝜷𝒑𝒙𝒑 + 𝜺, where 𝛽0 is the

intercept, β1 to βp are the coefficients for 𝑥1 to 𝑥𝑝 , and ε is the error term.

• OLS chooses β0, β1, …, βp to minimize the sum of squared differences between the observed y
values and the predicted y values from the regression line.

• If the OLS estimators meet certain conditions like linearity, lack of multi-collinearity,
homoscedasticity, absence of autocorrelation, and normality of errors, they will be unbiased,
consistent, and have the lowest variance among linear unbiased estimators.

Understanding the mathematics behind OLS algorithm

To explain the OLS algorithm, a simplest example is taken . Consider the following 3 data points
according to Table-1

𝒙𝟏 𝒚𝟏

2 6.3

4 11.6

7 15.7

T ABLE- 1
Comparision between fitted line using OLS algorithm and poor fit line:

F IGURE- 3

Formula used:
̂ = 𝒘𝟎 + 𝒘𝟏 𝑿𝟏
𝒀

̂ − 𝒀𝒊
𝑬𝒓𝒓𝒐𝒓𝒊 = 𝒀

𝑺𝒖𝒎 𝒐𝒇 𝒆𝒓𝒓𝒐𝒓𝒔 (𝑺. 𝑬): 𝑳 = ∑ 𝑬𝒓𝒓𝒐𝒓𝒊

𝒊=𝟏

𝑺𝒖𝒎 𝒐𝒇 𝒂𝒃𝒔𝒐𝒍𝒖𝒕𝒆 𝒆𝒓𝒓𝒐𝒓𝒔 (𝑺𝑨𝑬); 𝑳 = ∑ |𝑬𝒓𝒓𝒐𝒓|𝒊

{𝒊=𝟏}

𝑵
𝟐
𝑺𝒖𝒎 𝒐𝒇 𝒔𝒒𝒖𝒂𝒓𝒆𝒔 𝒐𝒇 𝒆𝒓𝒓𝒐𝒓𝒔 (𝑺𝑺𝑬): 𝑳 = ∑(𝒀̂𝒊 − 𝒀𝒊 )
𝒊=𝟏

𝑵
𝟏
̂𝒊 − 𝒀𝒊 )𝟐
𝑴𝒆𝒂𝒏 𝒐𝒇 𝒔𝒒𝒖𝒂𝒓𝒆𝒔 𝒐𝒇 𝒆𝒓𝒓𝒐𝒓𝒔 (𝑴𝑺𝑬): 𝑳 = ∑(𝒀
𝑵
𝒊=𝟏
𝑵
𝟏
̂𝒊 − 𝒀𝒊 )𝟐
𝑹𝒐𝒐𝒕 𝒎𝒆𝒂𝒏 𝒐𝒇 𝒔𝒒𝒖𝒂𝒓𝒆𝒔 𝒐𝒇 𝒆𝒓𝒓𝒐𝒓𝒔 (𝑹𝑴𝑺𝑬) = √ ∑(𝒀
𝑵
𝒊=𝟏

The following plot shows these 3 data points in pink squares. the purple line is the “best-fit line”
through these 3 data points. Also, I have shown a “poor-fitting” line (the cyan line) for comparison.

The net objective is to find the Equation of the Best-Fitting Straight Line (through these 3 data
points mentioned in the above table).

It is the equation of the best-fit line (purple line in the above plot), where 𝑤1 = slope of the
line; 𝑤0 = intercept of the line.

In machine learning, this best fit is called the Linear Regression (LR) model, and 𝒘𝟎 and 𝒘𝟏 are also
called model weights or model coefficients.

Practical implications with some examples

Example: 1
The weight and blood sugar level of randomly selected 7 females in the age group 55-65 are shown
below.

Weight Blood sugar level

75 110

86 125

93 160

54 104

85 114

103 203

95 196

T ABLE- 2

It is assumed that weight and blood sugar level are jointly normally distributed.
Results :
Using Linear regression model it is found that:

F IGURE- 4

Equation of regression line related to blood sugar level to weight : 𝒚 = 𝟐. 𝟏𝟏 𝒙 + (−𝟑𝟑. 𝟗𝟗) The
correlation coefficient = 0.81

Example: 2

Eight samples of food are taken. Y , X1 , X2 coloumn corespods to Total calories, calories from fat
and calories from protein respectively.

Total Calories (Y) Calories from fat (X1) Calories from protein (X2)

140 60 22

155 62 25

159 67 24

179 70 20

192 71 15

200 72 14

212 75 14

215 78 11

T ABLE -3
Results:
Using Multiple Linear Regression method we obtain the Mean squared error (MSE), Intercept and
Coefficients. The results are given below;

F IGURE – 5 (A)

F IGURE- 5 (B)

Conclusion
In conclusion, this report provided an in-depth exploration of the regression method in machine
learning, with a focus on the linear regression model. We discussed the Ordinary Least Squares
(OLS) algorithm, detailing its significance in minimizing error to fit a linear model. Through
illustrative examples, we demonstrated the application of linear regression in various datasets,
highlighting its predictive accuracy and relevance. Supporting results validated the model’s
effectiveness in both simple and multiple regression scenarios. Overall, the study reinforces the
utility of regression methods as foundational tools for predictive analytics in machine learning.
References

1. “The elements of Statistical Learning” by Trevor Hastie, Robert Tibshirani, and Jerome
Friedman.
2. “Pattern recognition and Machine learning “ by Christopher M. Bishop.
3. “Hands on machine learning with Scikit-Learn , Keras, and Tensorflow by Aurélien Géron
4. “Machine learning: A probabilistic Perspective” by Kevin P. Murphy.

5. Additional information was obtained from the following websites:

▪ https://www.javatpoint.com/linear-regression-in-machine-learning
▪ https://www.analyticsvidhya.com/blog/2023/01/a-comprehensive-guide-to-ols-
regression-part-1/
▪ https://www.geeksforgeeks.org/regression-in-machine-learning/
Thank you!

Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Foa Reference Guide To Fiber Optics Study Guide To Foa Certification PDF Free
No ratings yet
Foa Reference Guide To Fiber Optics Study Guide To Foa Certification PDF Free
27 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Data Science
100% (1)
Data Science
14 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
simple linear regression with example problem
No ratings yet
simple linear regression with example problem
12 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Unit III
No ratings yet
Unit III
18 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
UNIT - III
No ratings yet
UNIT - III
9 pages
1.5.Linear Regression
No ratings yet
1.5.Linear Regression
5 pages
unit-3 part 2 DA
No ratings yet
unit-3 part 2 DA
20 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Complete Linear Regression Algorithm
No ratings yet
Complete Linear Regression Algorithm
4 pages
Unit-3 Data Analysis
No ratings yet
Unit-3 Data Analysis
36 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Hanan
No ratings yet
Hanan
9 pages
MOD3_EDA
No ratings yet
MOD3_EDA
16 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
DA Notes 3
No ratings yet
DA Notes 3
12 pages
4 ML
No ratings yet
4 ML
41 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
7 pages
Notes 2
No ratings yet
Notes 2
22 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
Regression Analysis: Post Mid Assignment Topic
No ratings yet
Regression Analysis: Post Mid Assignment Topic
8 pages
6 Regression Analysis
No ratings yet
6 Regression Analysis
12 pages
Linear Regression in Machine Learning MY NOTES
No ratings yet
Linear Regression in Machine Learning MY NOTES
21 pages
Unit 2
No ratings yet
Unit 2
67 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
1linear Regression
No ratings yet
1linear Regression
12 pages
Linear Regression For Intermediate
No ratings yet
Linear Regression For Intermediate
6 pages
Regression
No ratings yet
Regression
14 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Linear regression case study
No ratings yet
Linear regression case study
6 pages
REGRESSION
No ratings yet
REGRESSION
86 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Exercises of Numerical Analysis
From Everand
Exercises of Numerical Analysis
Simone Malacrida
No ratings yet
Final M. Sc. Presentation ppt
No ratings yet
Final M. Sc. Presentation ppt
10 pages
Optical Theory of General Spectroscopy by Dr. Rakesh Roshan (HRX)
No ratings yet
Optical Theory of General Spectroscopy by Dr. Rakesh Roshan (HRX)
7 pages
SDR 1
No ratings yet
SDR 1
16 pages
Menu 634485035294707500 INTCHEMv1
No ratings yet
Menu 634485035294707500 INTCHEMv1
74 pages
Strogatz Chap 2
No ratings yet
Strogatz Chap 2
4 pages
EMT By फिजिक्स डीविलियर्स
No ratings yet
EMT By फिजिक्स डीविलियर्स
642 pages
Harmonic Oscillator - Series Solution
No ratings yet
Harmonic Oscillator - Series Solution
6 pages
Topic 7 Basic Concepts of Urban Drainage: (Urban Stormwater Management Manual For Malaysia) Masma
No ratings yet
Topic 7 Basic Concepts of Urban Drainage: (Urban Stormwater Management Manual For Malaysia) Masma
29 pages
340090-Causes and Effects of Sludge Formation in Motor Oils
No ratings yet
340090-Causes and Effects of Sludge Formation in Motor Oils
13 pages
Steel Truss Calculation
No ratings yet
Steel Truss Calculation
15 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
5 pages
Geomagnetic Disturbances Impacts on Power Systems: Risk Analysis and Mitigation Strategies 1st Edition Olga Sokolova - The ebook is ready for download to explore the complete content
100% (2)
Geomagnetic Disturbances Impacts on Power Systems: Risk Analysis and Mitigation Strategies 1st Edition Olga Sokolova - The ebook is ready for download to explore the complete content
55 pages
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
No ratings yet
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
25 pages
Balsamiq Keyboard Shortcuts
No ratings yet
Balsamiq Keyboard Shortcuts
1 page
Instant Download (Ebook PDF) Child Development: A Thematic Approach 6th Edition PDF All Chapter
100% (8)
Instant Download (Ebook PDF) Child Development: A Thematic Approach 6th Edition PDF All Chapter
41 pages
2 Hour Water Operator Mathematics
No ratings yet
2 Hour Water Operator Mathematics
74 pages
Simple Pendulum: Problems
No ratings yet
Simple Pendulum: Problems
6 pages
Kinetics of Polyethylene Terephthalate (PET) and Polystyrene (PS) Dynamic Pyrolysis
No ratings yet
Kinetics of Polyethylene Terephthalate (PET) and Polystyrene (PS) Dynamic Pyrolysis
9 pages
Office of The Secretary
No ratings yet
Office of The Secretary
6 pages
Demau 1
No ratings yet
Demau 1
2 pages
Thulasi Doss Resume
No ratings yet
Thulasi Doss Resume
2 pages
Apuntes Comparative Politics
No ratings yet
Apuntes Comparative Politics
88 pages
Synopsis 1
No ratings yet
Synopsis 1
9 pages
The Zola Bantu Book - The Eternal Gospel
No ratings yet
The Zola Bantu Book - The Eternal Gospel
151 pages
NELSON MANDELA
No ratings yet
NELSON MANDELA
52 pages
6th Class EM All Lessons
No ratings yet
6th Class EM All Lessons
33 pages
Tightening Torque of A Bolted Joint - Metric
No ratings yet
Tightening Torque of A Bolted Joint - Metric
2 pages
Topic 1 - Introduction of Production and Operation Management
No ratings yet
Topic 1 - Introduction of Production and Operation Management
16 pages
3.2 Online No 62
No ratings yet
3.2 Online No 62
6 pages
2020 April Catalog
No ratings yet
2020 April Catalog
32 pages
Geographical Information Center - CSU, Chico-GIS & Asset Management - Redacted
No ratings yet
Geographical Information Center - CSU, Chico-GIS & Asset Management - Redacted
19 pages
SPE/IADC 37589 Pressure Integrity Test Interpretation
No ratings yet
SPE/IADC 37589 Pressure Integrity Test Interpretation
14 pages
In Salah Gas Project Engineering Procurement & Construction Phase
No ratings yet
In Salah Gas Project Engineering Procurement & Construction Phase
17 pages
Self sensing concrete in smart structures 1st Edition Baoguo Han - Download the ebook today and own the complete content
100% (2)
Self sensing concrete in smart structures 1st Edition Baoguo Han - Download the ebook today and own the complete content
65 pages
Biometrics Presentation
No ratings yet
Biometrics Presentation
8 pages
Maize mid-density genotyping services _ Excellenceinbreeding
No ratings yet
Maize mid-density genotyping services _ Excellenceinbreeding
4 pages