Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12
LINEAR REGRESSION
19ME16N-FUNDAMENTAL OF MACHINE LEARNING
DEFINITION • Linear regression is perhaps one of the most well known and well understood algorithms in statistics and machine learning. • In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). • The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. • In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models. Linear regression has many practical uses. Most applications fall into one of the following two broad categories: If the goal is prediction, forecasting, or error reduction, linear regression can be used to fit a predictive model to an observed data set of values of the response and explanatory variables. After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make a prediction of the response. If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables, and in particular to determine whether some explanatory variables may have no linear relationship with the response at all, or to identify which subsets of explanatory variables may contain redundant information about the response. TYPES OF LINEAR REGRESSION: Univariate LR: • Linear relationships between y and X variables can be explained by a single X variable • y=a+bX+ϵ Where, a = y-intercept, b = slope of the regression line (unbiased estimate) and ϵ = error term (residuals) Multiple LR: • Linear relationships between y and X variables can be explained by multiple X variables • y=a+b1X1+b2X2+b3X3+...+bnXn+ϵ Where, a = y-intercept, b = slope of the regression line (unbiased estimate) and ϵ = error term (residuals) • The y-intercept (a) is a constant and slope (b) of the regression line is a regression coefficient. LINEAR REGRESSION (LR) ASSUMPTIONS
• The relationship between the X and y variables should be linear
• Errors (residuals) should be independent of each other • Errors (residuals) should be normally distributed with a mean of 0 • Errors (residuals) should have equal variance LINEAR REGRESSION (LR) OUTPUTS Correlation coefficient (r) • Correlation coefficient (r) describes a linear relationship between X and y variables. r can range from -1 to 1. • r > 0 indicates a positive linear relationship between X and y variables. As one of the variable increases, the other variable also increases. r = 1 is a perfect positive linear relationship • Similarly, r < 0 indicates a negative linear relationship between X and y variables. As one of the variable increases, the other variable decreases, and vice versa. r = -1 is perfect negative linear relationship • r = 0 indicates, there is no linear relationship between the X and y variables Coefficient of determination (R-Squared or r-Squared)
R-Squared (R2) is a square of correlation coefficient (r) and usually
represented as percentages.
R-Squared explains the variation in the y variable that is explained
by independent variables in the fitted regression.
Multiple correlation coefficient (R), which is the square root of the
R-Squared, is used to assess the prediction quality of the y variable in multiple regression analysis. Its value range from 0 to 1.
R-Squared can range from 0 to 1 (0 to 100%). R-squared = 1
(100%) indicates that the fitted regression line explains all the variability of Y variable around its mean. From this plot, it is clear that there is a strong linear relationship between these two features: as SIZE increases so too does RENTAL PRICES by a similar amount. If we could capture this relationship in a model, we would be able to do two important things. First, we would be able to understand how office size affects office rental price. Second, we would be able to fill in the gaps in the dataset to predict office rental prices for office sizes that we have never actually seen in the historical data t the equation of a line can be written as y=mx+c where m is the slope of the line, and b is known as the y-intercept of the line (i.e., the position at which the line meets the vertical axis when the value of x is set to zero). The equation of a line predicts a y value for every x value given the slope and the y- intercept, and we can use this simple model to capture the relationship between two features such as SIZE and RENTAL PRICE. It shows the same scatter plot as shown with a simple linear model added to capture the relationship between office sizes and office rental prices. This model is Rental price = 6.47+(0.62*Size)