0% found this document useful (0 votes)

7 views

Linear Regression

The document discusses linear regression, including its introduction, formulas, examples of simple and multiple linear regression, properties, coefficients, and types. Linear regression is used to model relationships between variables, determine predictor strength, and forecast effects. Types covered include simple, multiple, polynomial, discriminant, and logistic regression.

Uploaded by

maida maryam

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Linear Regression

Uploaded by

maida maryam

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Subject: Probability and Statistics

BSCS — 3-A
Department of Computer Science
Bahria University, Lahore Campus

Group 4
Linear Regression
Formulas for regression line
Examples

Linear Regression
Introduction
The term “regression” and the methods for investigating the relationships between two variables may
date back to about 100 years ago. It was first introduced by Francis Galton in 1908, the renowned British
biologist, when he was engaged in the study of heredity.

Linear regression is a basic and commonly used type of predictive analysis. The overall idea of regression
is to examine two things:

(I) Does a set of predictor variables do an excellent job in predicting an outcome (dependent)
variable?
(II) Which variables are significant predictors of the outcome variable, and how do they–indicated
by the magnitude and sign of the beta estimates–impact the outcome variable?

It is a statistical method used to model the relationship between a dependent variable (often denoted as
"y") and one or more independent variables (often denoted as "x"). It assumes that there is a linear
relationship between the independent variable(s) and the dependent variable.

The goal of linear regression is to find the equation of a straight line that best fits the data points. This
equation is typically represented as:

y = β0 + β1 𝑥 + 𝜀

Where:

 (y) is the dependent variable.

 (x) is the independent variable.
 (β1) is the gradient or slope of the line, indicating the relationship between (x) and (y).
 (β0) is the y-intercept, the value of (y) when (x) is zero.
 ( 𝜀) is a random error. It is usually assumed that error 𝜀is normally distributed with E( 𝜀) = 0 and
a constant variance Var( 𝜀) = σ2 in the simple linear regression.

Multiple Linear Regression

In the case of multiple independent variables, the equation becomes a linear combination:

y = β0 + β1 𝑥 1+ … + βp 𝑥 p + 𝜀

Where:

 y is the dependent variable.

 x1, x2, …, xn are independent variables.
 Β0, β1, β2, …, βp are regression coefficients.
 Error 𝜀 follows the normal distribution with E( 𝜀) = 0 and a constant variance Var( 𝜀) = σ2.

Simple linear regression is to investigate the linear relationship between one dependent variable and
one independent variable, while the multiple linear regression focuses on the linear relationship
between one dependent variable and more than one independent variables. The multiple linear
regression involves more issues than the simple linear regression such as collinearity, variance inflation,
graphical display of regression outlier and influential observation.

Non-Linear Regression
The non-linear regression model (growth model) may be written as

𝛼
𝑦= 𝛽𝑡
+𝜀
1+𝑒

Linear regression aims to find the best-fitting line by minimizing the difference between the actual
values and the predicted values (often done through a method called ordinary least squares). This
method calculates the coefficients (slope and intercept) that minimize the sum of the squared
differences between the actual and predicted values.

Properties of Linear Regression

For the regression line where the regression parameters β0 and β1 are defined, the following properties
are applicable:
 The regression line reduces the sum of squared differences between observed values and
predicted values.
 The regression line passes through the mean of X and Y variable values.
 The regression constant β0 is equal to the y-intercept of the linear regression.
 The regression coefficient β1 is the slope of the regression line. Its value is equal to the average
change in the dependent variable (Y) for a unit change in the independent variable (X).

Regression Coefficient
The regression coefficient is given by the equation:

y = β0 + β1x

Where:

 β0 is a constant.
 β1 is the regression coefficient.

Formula

Given below is the formula to find the value of the regression coefficient.

Σ [ ( 𝑥𝑖− 𝑥) ( 𝑦𝑖− 𝑦 )]
β1 =b1 =
Σ [( 𝑥𝑖 − 𝑥) 2]

Where:

 xi and yi are the observed data sets.

 x and y are the mean value.

Types of Linear Regression

Linear regression can be categorized into a few several types based on the number of independent
variables and the nature of the relationship:

 Simple Linear Regression:

This involves a single independent variable (interval or ratio or dichotomous) used to

predict the dependent variable (interval or ratio). The equation takes the form (y = mx + b),
where there is one predictor variable (x).

Example: Consider a scenario where you want to predict the price of a house based on
its area (in square feet). You collect data on house prices and their corresponding areas. The
simple linear regression model would look like this:

House Price = β0 + β1 × Area

Here, the dependent variable is the house price, and the independent variable is the area.

 Multiple Linear Regression:

In this type, there are multiple independent variables used to predict the dependent
variable. The equation becomes (y = b + m_1x_1 + m_2x_2 + ... + m_nx_n), where there are (n)
predictor variables (x_1, x_2, ..., x_n).

Example: Imagine you are predicting a student's final exam score based on several
factors such as hours studied, previous test scores, and attendance. The multiple linear
regression equation would be:

Final Exam Score = β0 + β1 × Hours Studied + β2 × Previous Test Score+ β3 × Attendance.

In this case, there are three independent variables: hours studied, previous test score, and
attendance.

 Polynomial Regression:

It is a form of linear regression where the relationship between the independent

variable and the dependent variable is modeled as a nth-degree polynomial. For instance, (y = b
+ m_1x + m_2x^2 + m_3x^3 + ...).

Example: Let's say you are studying the relationship between temperature and air
conditioning electricity consumption. You suspect the relationship might not be linear but could
be better represented by a quadratic equation. The polynomial regression equation might look
like:

Electricity Consumption = β0 + β1 × Temperature + β2 × Temperature²

Here, the quadratic term Temperature² allows for a curved relationship between temperature
and electricity consumption.

 Discriminant Regression:
Discriminant regression, sometimes referred to as discriminant least squares regression
or linear discriminant analysis with regression, represents an approach that combines elements
of both discriminant analysis and regression techniques.

Example: Let's say you want to predict whether a patient has a specific disease based on
their age, blood pressure, and cholesterol levels. Instead of using traditional logistic regression
(which is specifically designed for classification tasks), you try a regression model where you
predict the probability of having the disease using linear regression. You might create a model
that estimates the probability of having the disease based on the predictor variables.

 Logistic Regression:

Despite the name, logistic regression is a type of regression used for classification
problems rather than regression problems. It models the probability of a binary outcome by
using a logistic function. It is called "regression" because it estimates the relationship between
one dependent binary variable and one or more independent variables.

Example: Suppose you are predicting whether a customer will buy a product based on
their age and income. The logistic regression equation would estimate the probability of buying
the product given their age and income:

1
Probability of Buying = − ( 𝛽0 + 𝛽1 × 𝐴𝑔𝑒+ 𝛽2 × 𝐼𝑛𝑐𝑜𝑚𝑒 )
1+ 𝑒
Here, the dependent variable is binary (bought or did not buy), and the independent variables
are age and income.

These variations and types accommodate different scenarios and complexities within datasets, allowing
for more flexible modeling and analysis of relationships between variables.

Uses of Linear Regression

Three major uses for regression analysis are

1. Determining the strength of Predictors:

The regression might be used to identify the strength of the effect that the independent
variable(s) have on a dependent variable. Typical questions are what the strength of
relationship between dose and effect is, sales and marketing spending, or age and
income.

2. Forecasting an Effect:

It can be used to forecast effects or impact of changes. That is, the regression analysis
helps us to understand how much the dependent variable changes with a change in one
or more independent variables. A typical question is, “how much additional sales
income do I get for each additional $1000 spent on marketing?”
3. Trend Forecasting:

Regression analysis predicts trends and future values. The regression analysis can be
used to get point estimates. A typical question is, “what will the price of gold be in 6
months?”.

Examples
1. Parent’s Height and Children’s Height
The following classical data set contains the information of Parent’s height and Children’s height.

Parent (x) 64.5 65.5 66.5 67.5 68.5 69.5 70.5 71.5 72.5
Children (y) 65.8 66.7 67.2 67.6 68.2 68.9 69.5 69.9 72.2

First, calculate the mean of x and y:

( 64.5+65.5+66.5+ 67.5+68.5+69.5+70.5+71.5+ 72.5 )

x=
9
x = 68.5

( 65.8+66.7+67.2+ 67.6+68.2+68.9+69.5+ 69.9+72.2 )

y=
9
y = 68.4

Now, calculate the slope using formula

9
Σ 𝑖=1 ( 𝑥 𝑖 − 𝑥 ) ( 𝑦 𝑖 − 𝑦 )
β1 =
Σ 9𝑖=1 ( 𝑥 𝑖 − 𝑥 ) 2
9
Σ 𝑖=1 ( 64.5 −68.5 )( 65.8 − 68.4 ) +…+ ( 72.5− 68.5 ) ( 72.2− 68.4 )
β1 = 2 2
Σ 9𝑖=1 ( 64.5− 68.5 ) +…+ ( 72.5 − 68.5 )
10.4+…+ 15.2 25.6
β1 = =
16 +…+16 32
β1 = 0.8

Next, find the y-intercept β0 using formula

β0 = 𝑦 − 𝛽1 × 𝑥

β0 = 68.4 − 0.8× 68.5

β0 = 13.6
Therefore, the linear regression equation for the relationship between parent and children heights is
approximately:

𝐶h𝑖𝑙𝑑𝑟𝑒𝑛 ’ 𝑠 𝐻𝑒𝑖𝑔h𝑡=13.6+ 0.8× 𝑃𝑎𝑟𝑒𝑛𝑡 ′ 𝑠 𝐻𝑒𝑖𝑔h𝑡

This equation represents the best-fit line that models the relationship between parent heights and
children's heights based on the provided data.

2. Hours Studied and Test Score

Suppose we have a dataset that contains the number of hours students' study and their corresponding
scores on a test.

Here is a small set of data:

Hours Studied (x) 2 3 4 5 6

Test Scores (y) 60 70 75 80 85

First, calculate the means of x and y:

( 2+ 3+4 +5+6 )
x= =4
5
( 60+70+75+80+ 85 )
y= = 74
5
Now, calculate the slope β1:

( 2− 4 )( 60 − 74 ) +…+ ( 6 −4 ) ( 85 −74 )
β1 =
( 2 − 4 )2 +…+ ( 6 − 4 )2
50
β1 = = 6.25
8
Next, find the y-intercept (β0)

β0 = 74 – 6.25 * 4

β0 = 49

Therefore, the linear regression equation is

𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒=50+6 × 𝐻𝑜𝑢𝑟𝑠 𝑆𝑡𝑢𝑑𝑖𝑒𝑑

This equation allows us to predict test scores based on the number of hours studied.

Linear Regression
No ratings yet
Linear Regression
16 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
Hanan
No ratings yet
Hanan
9 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
Data Science
100% (1)
Data Science
14 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
1.5.Linear Regression
No ratings yet
1.5.Linear Regression
5 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
MBAS901 - L3
No ratings yet
MBAS901 - L3
103 pages
MOD3_EDA
No ratings yet
MOD3_EDA
16 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
BA3-4-5modules
No ratings yet
BA3-4-5modules
258 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
DIMPAS_BSCPE_2-7_ASSIGNMENT_NO.9
No ratings yet
DIMPAS_BSCPE_2-7_ASSIGNMENT_NO.9
17 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Unit 2
No ratings yet
Unit 2
19 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Module -05 Statistical Computing and r Programming
No ratings yet
Module -05 Statistical Computing and r Programming
53 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Regression
No ratings yet
Regression
14 pages
Linear Regression in Machine Learning MY NOTES
No ratings yet
Linear Regression in Machine Learning MY NOTES
21 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
DA-MODULE-3
No ratings yet
DA-MODULE-3
54 pages
Notes 2
No ratings yet
Notes 2
22 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
Linear Regression
100% (1)
Linear Regression
8 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Machine Learning I
No ratings yet
Machine Learning I
61 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
Unit III
No ratings yet
Unit III
18 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
AI_Lec23
No ratings yet
AI_Lec23
36 pages
Msf
No ratings yet
Msf
10 pages
Regression: Rashid Mehmood M.Phil. (Education) 2 Semester
No ratings yet
Regression: Rashid Mehmood M.Phil. (Education) 2 Semester
22 pages
Linear Regression (1)
No ratings yet
Linear Regression (1)
19 pages
Chapter_2_Linear and Logistic Regression
No ratings yet
Chapter_2_Linear and Logistic Regression
34 pages