Linear Regression
Linear Regression
BSCS — 3-A
Department of Computer Science
Bahria University, Lahore Campus
Group 4
Linear Regression
Formulas for regression line
Examples
Linear Regression
Introduction
The term “regression” and the methods for investigating the relationships between two variables may
date back to about 100 years ago. It was first introduced by Francis Galton in 1908, the renowned British
biologist, when he was engaged in the study of heredity.
Linear regression is a basic and commonly used type of predictive analysis. The overall idea of regression
is to examine two things:
(I) Does a set of predictor variables do an excellent job in predicting an outcome (dependent)
variable?
(II) Which variables are significant predictors of the outcome variable, and how do they–indicated
by the magnitude and sign of the beta estimates–impact the outcome variable?
It is a statistical method used to model the relationship between a dependent variable (often denoted as
"y") and one or more independent variables (often denoted as "x"). It assumes that there is a linear
relationship between the independent variable(s) and the dependent variable.
The goal of linear regression is to find the equation of a straight line that best fits the data points. This
equation is typically represented as:
y = β0 + β1 𝑥 + 𝜀
Where:
y = β0 + β1 𝑥 1+ … + βp 𝑥 p + 𝜀
Where:
Simple linear regression is to investigate the linear relationship between one dependent variable and
one independent variable, while the multiple linear regression focuses on the linear relationship
between one dependent variable and more than one independent variables. The multiple linear
regression involves more issues than the simple linear regression such as collinearity, variance inflation,
graphical display of regression outlier and influential observation.
Non-Linear Regression
The non-linear regression model (growth model) may be written as
𝛼
𝑦= 𝛽𝑡
+𝜀
1+𝑒
Linear regression aims to find the best-fitting line by minimizing the difference between the actual
values and the predicted values (often done through a method called ordinary least squares). This
method calculates the coefficients (slope and intercept) that minimize the sum of the squared
differences between the actual and predicted values.
Regression Coefficient
The regression coefficient is given by the equation:
y = β0 + β1x
Where:
β0 is a constant.
β1 is the regression coefficient.
Formula
Given below is the formula to find the value of the regression coefficient.
Σ [ ( 𝑥𝑖− 𝑥) ( 𝑦𝑖− 𝑦 )]
β1 =b1 =
Σ [( 𝑥𝑖 − 𝑥) 2]
Where:
Example: Consider a scenario where you want to predict the price of a house based on
its area (in square feet). You collect data on house prices and their corresponding areas. The
simple linear regression model would look like this:
Here, the dependent variable is the house price, and the independent variable is the area.
In this type, there are multiple independent variables used to predict the dependent
variable. The equation becomes (y = b + m_1x_1 + m_2x_2 + ... + m_nx_n), where there are (n)
predictor variables (x_1, x_2, ..., x_n).
Example: Imagine you are predicting a student's final exam score based on several
factors such as hours studied, previous test scores, and attendance. The multiple linear
regression equation would be:
In this case, there are three independent variables: hours studied, previous test score, and
attendance.
Polynomial Regression:
Example: Let's say you are studying the relationship between temperature and air
conditioning electricity consumption. You suspect the relationship might not be linear but could
be better represented by a quadratic equation. The polynomial regression equation might look
like:
Here, the quadratic term Temperature² allows for a curved relationship between temperature
and electricity consumption.
Discriminant Regression:
Discriminant regression, sometimes referred to as discriminant least squares regression
or linear discriminant analysis with regression, represents an approach that combines elements
of both discriminant analysis and regression techniques.
Example: Let's say you want to predict whether a patient has a specific disease based on
their age, blood pressure, and cholesterol levels. Instead of using traditional logistic regression
(which is specifically designed for classification tasks), you try a regression model where you
predict the probability of having the disease using linear regression. You might create a model
that estimates the probability of having the disease based on the predictor variables.
Logistic Regression:
Despite the name, logistic regression is a type of regression used for classification
problems rather than regression problems. It models the probability of a binary outcome by
using a logistic function. It is called "regression" because it estimates the relationship between
one dependent binary variable and one or more independent variables.
Example: Suppose you are predicting whether a customer will buy a product based on
their age and income. The logistic regression equation would estimate the probability of buying
the product given their age and income:
1
Probability of Buying = − ( 𝛽0 + 𝛽1 × 𝐴𝑔𝑒+ 𝛽2 × 𝐼𝑛𝑐𝑜𝑚𝑒 )
1+ 𝑒
Here, the dependent variable is binary (bought or did not buy), and the independent variables
are age and income.
These variations and types accommodate different scenarios and complexities within datasets, allowing
for more flexible modeling and analysis of relationships between variables.
The regression might be used to identify the strength of the effect that the independent
variable(s) have on a dependent variable. Typical questions are what the strength of
relationship between dose and effect is, sales and marketing spending, or age and
income.
2. Forecasting an Effect:
It can be used to forecast effects or impact of changes. That is, the regression analysis
helps us to understand how much the dependent variable changes with a change in one
or more independent variables. A typical question is, “how much additional sales
income do I get for each additional $1000 spent on marketing?”
3. Trend Forecasting:
Regression analysis predicts trends and future values. The regression analysis can be
used to get point estimates. A typical question is, “what will the price of gold be in 6
months?”.
Examples
1. Parent’s Height and Children’s Height
The following classical data set contains the information of Parent’s height and Children’s height.
Parent (x) 64.5 65.5 66.5 67.5 68.5 69.5 70.5 71.5 72.5
Children (y) 65.8 66.7 67.2 67.6 68.2 68.9 69.5 69.9 72.2
β0 = 𝑦 − 𝛽1 × 𝑥
β0 = 13.6
Therefore, the linear regression equation for the relationship between parent and children heights is
approximately:
( 2+ 3+4 +5+6 )
x= =4
5
( 60+70+75+80+ 85 )
y= = 74
5
Now, calculate the slope β1:
( 2− 4 )( 60 − 74 ) +…+ ( 6 −4 ) ( 85 −74 )
β1 =
( 2 − 4 )2 +…+ ( 6 − 4 )2
50
β1 = = 6.25
8
Next, find the y-intercept (β0)
β0 = 74 – 6.25 * 4
β0 = 49