Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
17 views

Linear Regression

Uploaded by

Varun Varun
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Linear Regression

Uploaded by

Varun Varun
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

LINEAR REGRESSION

19ME16N-FUNDAMENTAL OF MACHINE LEARNING


DEFINITION
• Linear regression is perhaps one of the most well known and well understood
algorithms in statistics and machine learning.
• In statistics,
linear regression is a linear approach for modelling the
relationship between a scalar response and one or more explanatory variables
(also known as dependent and independent variables).
• The case of one explanatory variable is called simple linear regression; for
more than one, the process is called multiple linear regression.
• In linear regression, the relationships are modeled using
linear predictor functions whose unknown model parameters are estimated
from the data. Such models are called linear models.
Linear regression has many practical uses. Most applications fall into one of the
following two broad categories:
 If the goal is prediction, forecasting, or error reduction, linear regression can
be used to fit a predictive model to an observed data set of values of the
response and explanatory variables. After developing such a model, if
additional values of the explanatory variables are collected without an
accompanying response value, the fitted model can be used to make a
prediction of the response.
 If the goal is to explain variation in the response variable that can be attributed
to variation in the explanatory variables, linear regression analysis can be
applied to quantify the strength of the relationship between the response and
the explanatory variables, and in particular to determine whether some
explanatory variables may have no linear relationship with the response at all,
or to identify which subsets of explanatory variables may contain redundant
information about the response.
TYPES OF LINEAR REGRESSION:
Univariate LR:
• Linear relationships between y and X variables can be explained by a
single X variable
• y=a+bX+ϵ
Where, a = y-intercept, b = slope of the regression line (unbiased estimate)
and ϵ = error term (residuals)
Multiple LR:
• Linear relationships between y and X variables can be explained by
multiple X variables
• y=a+b1X1+b2X2+b3X3+...+bnXn+ϵ
Where, a = y-intercept, b = slope of the regression line (unbiased estimate)
and ϵ = error term (residuals)
• The y-intercept (a) is a constant and slope (b) of the regression line is a
regression coefficient.
LINEAR REGRESSION (LR)
ASSUMPTIONS

• The relationship between the X and y variables should be linear


• Errors (residuals) should be independent of each other
• Errors (residuals) should be normally distributed with a mean of 0
• Errors (residuals) should have equal variance
LINEAR REGRESSION (LR) OUTPUTS
Correlation coefficient (r)
• Correlation coefficient (r) describes a linear relationship
between X and y variables. r can range from -1 to 1.
• r > 0 indicates a positive linear relationship between X and y variables. As one
of the variable increases, the other variable also increases. r = 1 is a perfect
positive linear relationship
• Similarly, r < 0 indicates a negative linear relationship
between X and y variables. As one of the variable increases, the other
variable decreases, and vice versa. r = -1 is perfect negative linear relationship
• r = 0 indicates, there is no linear relationship between the X and y variables
Coefficient of determination (R-Squared or r-Squared)

 R-Squared (R2) is a square of correlation coefficient (r) and usually


represented as percentages.

 R-Squared explains the variation in the y variable that is explained


by independent variables in the fitted regression.

 Multiple correlation coefficient (R), which is the square root of the


R-Squared, is used to assess the prediction quality of the y variable
in multiple regression analysis. Its value range from 0 to 1.

 R-Squared can range from 0 to 1 (0 to 100%). R-squared = 1


(100%) indicates that the fitted regression line explains all the
variability of Y variable around its mean.
 From this plot, it is clear that there is a strong linear relationship between these two
features: as SIZE increases so too does RENTAL PRICES by a similar amount.
 If we could capture this relationship in a model, we would be able to do two
important things.
 First, we would be able to understand how office size affects office rental price.
 Second, we would be able to fill in the gaps in the dataset to predict office rental
prices for office sizes that we have never actually seen in the historical data
 t the equation of a line can be written as y=mx+c where m is the slope of the line,
and b is known as the y-intercept of the line (i.e., the position at which the line meets
the vertical axis when the value of x is set to zero).
 The equation of a line predicts a y value for every x value given the slope and the y-
intercept, and we can use this simple model to capture the relationship between two
features such as SIZE and RENTAL PRICE.
 It shows the same scatter plot as shown with a simple linear model added to capture
the relationship between office sizes and office rental prices.
 This model is Rental price = 6.47+(0.62*Size)

You might also like