0% found this document useful (0 votes)

1 views

lecture 9-10

Uploaded by

Muhammad Usman Fakhar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

lecture 9-10

Uploaded by

Muhammad Usman Fakhar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Regression

Regression
 Regression is a statistical approach used to analyze the relationship
between a dependent variable (target variable) and one or more
independent variables (predictor variables). The objective is to
determine the most suitable function that characterizes the connection
between these variables.

 It is a supervised machine learning technique, used to predict the

value of the dependent variable for new, unseen data. It models the
relationship between the input features and the target variable,
allowing for the estimation or prediction of numerical values.
Terminologies Related to Regression
Analysis
Response Variable: The primary factor to predict or understand in
regression, also known as the dependent variable or target variable.

Predictor Variable: Factors influencing the response variable, used to predict

its values; also called independent variables.

Outliers: Observations with significantly low or high values compared to

others, potentially impacting results and best avoided.

Multicollinearity: High correlation among independent variables, which can

complicate the ranking of influential variables.

Underfitting and Overfitting: Overfitting occurs when an algorithm

performs well on training but poorly on testing, while underfitting indicates
poor performance on both datasets.
Types
Depending on the number of input variables, the regression problem classified into
1) Simple linear regression
2) Multiple linear regression
Simple Linear Regression
Used to predict a continuous dependent variable based on a single independent variable.
Simple linear regression should be used when there is only a single independent variable.
Multiple Regression
Used to predict a continuous dependent variable based on multiple independent variables.
Multiple linear regression should be used when there are multiple independent variables.
Non Linear Regression
Relationship between the dependent variable and independent variable(s) follows a nonlinear
pattern.
Provides flexibility in modeling a wide range of functional forms.
Linear regression
Linear regression is one of the simplest and most widely used statistical
models. This assumes that there is a linear relationship between the
independent and dependent variables. This means that the change in the
dependent variable is proportional to the change in the independent variables.
The equation for simple linear regression is
𝑦ො = 𝛽0 + 𝛽1 𝑋
where:
𝑦 is the dependent variable

𝑋 is the independent variable

𝛽0 is the intercept

𝛽1 is the slope
Multiple Linear Regression
This involves more than one independent variable and one dependent
variable. The equation for multiple linear regression is:
𝑦ො = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ … … 𝛽𝑛 𝑋𝑛
where:
𝑦 is the dependent variable

𝑋1 , 𝑋2 , … , 𝑋𝑛 are the independent variables

𝛽0 is the intercept

𝛽1 , 𝛽2 , … , 𝛽𝑛 are the slopes

Best fit line
The best Fit Line equation provides a straight line that represents the
relationship between the dependent and independent variables. The
slope of the line indicates how much the dependent variable changes for
a unit change in the independent variable(s)
Linear regression performs the task to
predict a dependent variable value (y) based
on a given independent variable (x)).
Hence, the name is Linear Regression. In
the figure above, X (input) is the work
experience and Y (output) is the salary of a
person. The regression line is the best-fit
line for our model.
Formulas for Simple regression
The equation of a simple linear regression line is:

𝑦ො = 𝛽0 + 𝛽1 𝑋

σ 𝑋𝑖 − 𝑋ത 𝑦𝑖 − 𝑦ത
𝛽0 = 2
σ 𝑋𝑖 − 𝑋ത

𝛽1 = 𝑦ത − 𝛽0 𝑋ത
RMSE
One way to assess how well a regression model fits a dataset is to
calculate the root mean square error, which is a metric that tells us the
average distance between the predicted values from the model and the
actual values in the dataset.
The lower the RMSE, the better a given model is able to “fit” a dataset.
The formula to find the root mean square error, often
abbreviated RMSE, is as follows:
σ𝒏𝒊=𝟏 𝒚𝒊 − 𝒚
ෝ𝒊 𝟐
RMSE =
𝒏
Why Use RMSE
Measures Model Accuracy:
RMSE tells us how well the regression line fits the data by calculating the
average error.
Smaller RMSE values indicate a better fit.
Sensitive to Large Errors:
Squaring the errors gives more weight to large differences, making RMSE
sensitive to outliers.
Easy Interpretation:
RMSE is in the same units as the dependent variable (e.g., if predicting price in
USD, RMSE will also be in USD).
Applications of Regression
Predicting prices: For example, a regression model could be used to predict
the price of a house based on its size, location, and other features.

Forecasting trends: For example, a regression model could be used to

forecast the sales of a product based on historical sales data and economic
indicators.

Identifying risk factors: For example, a regression model could be used to

identify risk factors for heart disease based on patient data.

Making decisions: For example, a regression model could be used to

recommend which investment to buy based on market data.
Advantages of Regression

Easy to understand and interpret

Robust to outliers

Can handle both linear and nonlinear relationships.

Disadvantages of Regression

Assumes linearity

Sensitive to multicollinearity

May not be suitable for highly complex relationships

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
# Given data
x = np.array([1, 2, 4, 3, 5]).reshape(-1, 1) # Reshape for sklearn
y = np.array([1, 3, 3, 2, 5])
# Create and fit the linear regression model
model = LinearRegression()
model.fit(x, y)
# Predict using the model
y_pred = model.predict(x)
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(y, y_pred))
# Display results
print("Regression Coefficient (Slope):", model.coef_[0])
print("Intercept:", model.intercept_)
print("Predicted Values:", y_pred)
print("RMSE:", rmse)
# Plotting the data
plt.scatter(x, y, color='blue', label='Actual Data') # Original points
plt.plot(x, y_pred, color='red', label='Best Fit Line') # Regression line
plt.title("Linear Regression with Best Fit Line")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.legend()
plt.show()
Example
Let’s consider there is a company, and it has to improve the sales of the
product. The company spends money on different advertising media
such as TV, radio, and newspaper to increase the sales of its products.
The company records the money spent on each advertising media (in
thousands of dollars) and the number of units of product sold (in
thousands of units).
Now we have to help the company to find out the most effective way to
spend money on advertising media to improve sales for the next year
with a less advertising budget.
TV Advertising ($1000) Sales ($1000)

10 9

20 18

30 25

40 28

50 35

60 40

70 50

80 55

90 62

100 70
The equation of a simple linear regression line is:

𝑦ො = 𝛽0 + 𝛽1 𝑋

σ 𝑋𝑖 − 𝑋ത 𝑦𝑖 − 𝑦ത
𝛽0 = 2
σ 𝑋𝑖 − 𝑋ത

𝛽1 = 𝑦ത − 𝛽0 𝑋ത
Thus, the regression line is:
𝑦ො = 0.66𝑥 + 3
For 𝑥 = 85
𝑦 = 0.66 × 85 + 3 = 59.1
So, the predicted sales for an $85,000 TV advertising budget are
approximately 59100 units.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Data: Advertising budget (X) and sales (Y)
x = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100]).reshape(-1, 1) # Independent variable
y = np.array([9, 18, 24, 28, 35, 40, 50, 55, 62, 70]) # Dependent variable
# Create and fit the linear regression model
model = LinearRegression()
model.fit(x, y)
# Predict using the model
y_pred = model.predict(x)
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(y, y_pred))
# Display results
print("Regression Coefficient (Slope):", model.coef_[0])
print("Intercept:", model.intercept_)
print("Predicted Values:", y_pred)
print("RMSE:", rmse)
# Plotting the results
plt.scatter(x, y, color="blue", label="Actual Sales Data")
plt.plot(x, y_pred, color="red", label="Best Fit Line")
plt.title("Advertising Budget vs Sales")
plt.xlabel("Advertising Budget (in $1000s)")
plt.ylabel("Sales (in $1000s)")
plt.legend()
plt.show()
# Predict sales for a reduced budget (e.g., $85,000)
reduced_budget = 85 # $85,000
predicted_sales = model.predict(np.array([[reduced_budget]]))
print("Predicted Sales for $85,000 Advertising Budget:",
predicted_sales[0])

Regression Coefficient (Slope): 0.6563636363636365

Intercept: 2.999999999999993
Predicted Values: [ 9.56363636 16.12727273 22.69090909 29.25454545 35.81818182
42.38181818 48.94545455 55.50909091 62.07272727 68.63636364]
RMSE: 1.2919330126174908

Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Test Bank Questions Chapter 5
No ratings yet
Test Bank Questions Chapter 5
5 pages
Stock Watson 3u Exercise Solutions Chapter 10 Instructors
No ratings yet
Stock Watson 3u Exercise Solutions Chapter 10 Instructors
13 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
DA-MODULE-3
No ratings yet
DA-MODULE-3
54 pages
Hanan
No ratings yet
Hanan
9 pages
Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
Data Science
100% (1)
Data Science
14 pages
Module 4
No ratings yet
Module 4
41 pages
AI_Lec23
No ratings yet
AI_Lec23
36 pages
1.5.Linear Regression
No ratings yet
1.5.Linear Regression
5 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
LECTURE Regression
No ratings yet
LECTURE Regression
12 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Data Analytics Regression Unit III
No ratings yet
Data Analytics Regression Unit III
27 pages
PA
No ratings yet
PA
28 pages
BA3-4-5modules
No ratings yet
BA3-4-5modules
258 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Regression v33
No ratings yet
Regression v33
81 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
Regression Unit-2
No ratings yet
Regression Unit-2
5 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Linearregressionpl
No ratings yet
Linearregressionpl
9 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
unit-3 part 2 DA
No ratings yet
unit-3 part 2 DA
20 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
MBAS901 - L3
No ratings yet
MBAS901 - L3
103 pages
Unit-2-Linear Regression-R1
No ratings yet
Unit-2-Linear Regression-R1
21 pages
m2 Data analytic and visualization
No ratings yet
m2 Data analytic and visualization
53 pages
Model Development
No ratings yet
Model Development
80 pages
Linear Regression
No ratings yet
Linear Regression
46 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript
No ratings yet
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript
9 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
Regression Test Lesson Notes (Optional Download)
No ratings yet
Regression Test Lesson Notes (Optional Download)
5 pages
Msf
No ratings yet
Msf
10 pages
Lec 17 - Principal Component Analysis PDF
No ratings yet
Lec 17 - Principal Component Analysis PDF
30 pages
Econ 3505
0% (1)
Econ 3505
4 pages
Examen Deep Learning
100% (1)
Examen Deep Learning
8 pages
CSCI 688 Homework 6: Megan Rose Bryant Department of Mathematics William and Mary November 12, 2014
No ratings yet
CSCI 688 Homework 6: Megan Rose Bryant Department of Mathematics William and Mary November 12, 2014
12 pages
Reece Harding - Stata 12
No ratings yet
Reece Harding - Stata 12
2 pages
Keys To Success in A Run-and-Gun Basketball System PDF
No ratings yet
Keys To Success in A Run-and-Gun Basketball System PDF
12 pages
STAB27
No ratings yet
STAB27
51 pages
Real Statistics Examples Correlation Reliability
No ratings yet
Real Statistics Examples Correlation Reliability
320 pages
Steps in Factor Analysis
No ratings yet
Steps in Factor Analysis
3 pages
Linear Regression in Excel
No ratings yet
Linear Regression in Excel
7 pages
SB T222WSB 7 Statistic Report
No ratings yet
SB T222WSB 7 Statistic Report
13 pages
Instant ebooks textbook Longitudinal Data Analysis by Donald Hedeker download all chapters
100% (4)
Instant ebooks textbook Longitudinal Data Analysis by Donald Hedeker download all chapters
40 pages
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
No ratings yet
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
4 pages
Logistic Regression in Stata
No ratings yet
Logistic Regression in Stata
21 pages
Exploratory Factor Analysis Concepts and Theory
No ratings yet
Exploratory Factor Analysis Concepts and Theory
9 pages
Activity 5. Linear Regression and Correlation 2 Nicole Tayaua
No ratings yet
Activity 5. Linear Regression and Correlation 2 Nicole Tayaua
4 pages
Decision Trees Palagraism
No ratings yet
Decision Trees Palagraism
16 pages
Chapter 4
No ratings yet
Chapter 4
29 pages
Lian Polyan Watumlawar, Lakon Utamakno, Yudho Dwi Galih Cahyono Institut Teknologi Adhi Tama Surabaya
No ratings yet
Lian Polyan Watumlawar, Lakon Utamakno, Yudho Dwi Galih Cahyono Institut Teknologi Adhi Tama Surabaya
8 pages
SUMMARY
No ratings yet
SUMMARY
16 pages
SPSS Problems Solved
100% (2)
SPSS Problems Solved
15 pages
Ensemble Learning
100% (1)
Ensemble Learning
7 pages
Tugas MPS Variabel Jurnal
No ratings yet
Tugas MPS Variabel Jurnal
5 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
Time Series Analysis With Minitab13
No ratings yet
Time Series Analysis With Minitab13
19 pages
Stat 331 Applied Linear Models - Assignment 1: I 0 1 I I I I N I 1 2 I
No ratings yet
Stat 331 Applied Linear Models - Assignment 1: I 0 1 I I I I N I 1 2 I
2 pages
BSNS 6001 Exam 2 (Summer 2016)
No ratings yet
BSNS 6001 Exam 2 (Summer 2016)
17 pages
Avcce QB Aml JSD - Ice
No ratings yet
Avcce QB Aml JSD - Ice
10 pages