Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
10 views

Linear Regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Linear Regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Linear Regression

Linear regression is used to predict the relationship between two variables by


applying a linear equation to observed data. There are two types of variable, one
variable is called an independent variable, and the other is a dependent variable.
Linear regression is commonly used for predictive analysis. The main idea of
regression is to examine two things. First, does a set of predictor variables do a
good job in predicting an outcome (dependent) variable? The second thing is
which variables are significant predictors of the outcome variable? In this article,
we will discuss the concept of the Linear Regression Equation, formula and
Properties of Linear Regression.
Examples of Linear Regression
The weight of the person is linearly related to their height. So, this shows a linear
relationship between the height and weight of the person. According to this, as
we increase the height, the weight of the person will also increase. It is not
necessary that one variable is dependent on others, or one causes the other, but
there is some critical relationship between the two variables. In such cases, we
use a scatter plot to simplify the strength of the relationship between the
variables. If there is no relation or linking between the variables then the scatter
plot does not indicate any increasing or decreasing pattern. In such cases, the
linear regression design is not beneficial to the given data.
Linear Regression Equation
The measure of the relationship between two variables is shown by the
correlation coefficient. The range of the coefficient lies between -1 to +1. This
coefficient shows the strength of the association of the observed data between
two variables.
Linear Regression Equation is given below:

Y=a+bX

where X is the independent variable and it is plotted along the x-axis


Y is the dependent variable and it is plotted along the y-axis

Here, the slope of the line is b, and a is the intercept (the value of y when x = 0).

Linear Regression Formula

As we know, linear regression shows the linear relationship between two


variables. The equation of linear regression is similar to that of the slope formula.
We have learned this formula before in earlier classes such as a linear equation
in two variables. Linear Regression Formula is given by the equation

Y= a + bX

Simple Linear Regression

Simple linear regression is the most straight forward case having a single scalar
predictor variable x and a single scalar response variable y. The equation for this
regression is given as y=a+bx

The expansion to multiple and vector-valued predictor variables is known as


multiple linear regression. It is also known as multivariable linear regression. The
equation for this regression is given as Y = a+bX. Almost all real-world
regression patterns include multiple predictors. The basic explanations of linear
regression are often explained in terms of multiple regression. Note that, in these
cases, the dependent variable y is yet a scalar.
Least Square Regression Line or Linear Regression Line
The most popular method to fit a regression line in the XY plot is found by using
least-squares. This process is used to determine the best-fitting line for the given
data by reducing the sum of the squares of the vertical deviations from each data
point to the line. If a point rests on the fitted line accurately, then the value of its
perpendicular deviation is 0. It is 0 because the variations are first squared, then
added, so their positive and negative values will not be cancelled. Linear
regression determines the straight line, known as the least-squares regression
line or LSRL. Suppose Y is a dependent variable and X is an independent
variable, then the population regression line is given by the equation;
Y= B0+B1X
Where
B0 is a constant
B1 is the regression coefficient
When a random sample of observations is given, then the regression line is
expressed as;
ŷ = b0+b1x
where b0 is a constant
b1 is the regression coefficient,
x is the independent variable,

ŷ is known as the predicted value of the dependent variable.


Properties of Linear Regression

For the regression line where the regression parameters b0 and b1are defined,
the following properties are applicable:

● The regression line reduces the sum of squared differences between


observed values and predicted values.
● The regression line passes through the mean of X and Y variable values.
● The regression constant b0 is equal to the y-intercept of the linear
regression.
● The regression coefficient b1 is the slope of the regression line. Its value is
equal to the average change in the dependent variable (Y) for a unit
change in the independent variable (X)
Regression Coefficient
The regression coefficient is given by the equation :
Y= B0+B1X
Where
B0 is a constant
B1 is the regression coefficient
Given below is the formula to find the value of the regression coefficient.
B1=b1 = ∑[(xi-x)(yi-y)]/∑[(xi-x)2]
Where xi and yi are the observed data sets.

And x and y are the mean value.


Importance of Regression Line

A regression line is used to describe the behaviour of a set of data, a logical


approach that helps us study and analyze the relationship between two different
continuous variables. Which is then enacted in machine learning models,
mathematical analysis, statistics field, forecasting sectors, and other such
quantitative applications. Looking at the financial sector, where financial analysts
use linear regression to predict stock prices and commodity prices and perform
various stock valuations for different securities. Several well-renowned
companies make use of linear regressions for the purpose of predicting sales,
inventories, etc.
Key Ideas of Linear Regression

● Correlation explains the interrelation between variables within the data.


● Variance is the degree of the spread of the data.
● Standard deviation is the dispersion of mean from a data set by studying
the variance’s square root.
● Residual (error term) is the actual value found within the dataset minus the
expected value that is predicted in linear regression.
Important Properties of Regression Line

● Regression coefficient values remain the same because the shifting of


origin takes place because of the change of scale. The property says that if
the variables x and y are changed to u and v respectively u= (x-a)/p v=(y-c)
/q, Here p and q are the constants.Byz =q/p*bvu Bxy=p/q*buv.
● If there are two lines of regression and both the lines intersect at a
selected point (x’, y’). The variables x and y are considered. According to
the property, the intersection of the two regression lines is (x`, y`), which is
the solution of the equations for both the variables x and y.
● You will understand that the correlation coefficient between the two
variables x and y is the geometric mean of both the coefficients. Also, the
sign over the values of correlation coefficients will be the common sign of
both the coefficients. So, if according to the property regression
coefficients are byx= (b) and bxy= (b’) then the correlation coefficient is
r=+-sqrt (byx + bxy) which is why in some cases, both the values of
coefficients are negative value and r is also negative. If both the values of
coefficients are positive then r is going to be positive.
● The regression constant (a0) is equal to the y-intercept of the regression
line and also a0 and a1 are the regression parameters.

Regression Line Formula:


A linear regression line equation is written as-
Y = a + bX
where X is plotted on the x-axis and Y is plotted on the y-axis. X is an
independent variable and Y is the dependent variable. Here, b is the slope of the
line and a is the intercept, i.e. value of y when x=0.

Multiple Regression Line Formula: y= a +b1x1 +b2x2 + b3x3 +…+ btxt + u


Assumptions made in Linear Regression
● The dependent/target variable is continuous.
● There isn’t any relationship between the independent variables.
● There should be a linear relationship between the dependent and
explanatory variables.
● Residuals should follow a normal distribution.
● Residuals should have constant variance.
● Residuals should be independently distributed/no autocorrelation.
Solved Examples
● Problem 1
Consider the following set of points: {(-2 , -1) , (1 , 1) , (3 , 2)}
a) Find the least square regression line for the given data points.
b) Plot the given points and the regression line in the same rectangular
system of axes.

Solution

● Problem 2
a) Find the least square regression line for the following set of data

{(-1 , 0),(0 , 2),(1 , 4),(2 , 5)}


b) Plot the given points and the regression line in the same rectangular
system of axes.

Solution

b) We now graph the regression line given by y = ax + b and the given


points.

● Problem 3
The values of y and their corresponding values of y are shown in the
table below

x 0 1 2 3 4

y 2 3 5 4 6

a) Find the least square regression line y = a x + b.


b) Estimate the value of y when x = 10.

Solution
b) Now that we have the least square regression line y = 0.9 x + 2.2,
substitute x by 10 to find the value of the corresponding y.

y = 0.9 * 10 + 2.2 = 11.2

● Problem 4
The sales of a company (in million dollars) for each year are shown in
the table below.

x (year) 2005 2006 2007 2008 2009

y (sales) 12 19 29 37 45

a) Find the least square regression line y = a x + b.


b) Use the least squares regression line as a model to estimate the
sales of the company in 2012.

Solution

You might also like