Linear Regression-Part 2
Linear Regression-Part 2
Part 2
Linear Regression
Continue
• When implementing simple linear regression, you typically start with a given set of input-output (𝑥-𝑦) pairs.
• These pairs are your observations, shown as green circles in the figure.
• For example, the leftmost observation has the input 𝑥 = 5 and the actual output, or response, 𝑦 = 5. The next one has 𝑥 = 15 and 𝑦 = 20, and so
on.
• The estimated regression function, represented by the black line, has the equation
𝑓(𝑥) = 𝑏₀ + 𝑏₁𝑥.
• Your goal is to calculate the optimal values of the predicted weights 𝑏₀ and 𝑏₁ that minimize SSR and determine the estimated regression
function.
• The value of 𝑏₀, also called the intercept, shows the point where the estimated regression line crosses the 𝑦 axis.
• It’s the value of the estimated response 𝑓(𝑥) for 𝑥 = 0.
• The value of 𝑏₁ determines the slope of the estimated regression line.
• The predicted responses, shown as red squares, are the points on the regression line that correspond to the input
values.
• For example, for the input 𝑥 = 5, the predicted response is 𝑓(5) = 8.33, which the leftmost red square represents.
• The vertical dashed grey lines represent the residuals, which can be calculated as
𝑦ᵢ - 𝑓(𝐱ᵢ) = 𝑦ᵢ - 𝑏₀ - 𝑏₁𝑥ᵢ for 𝑖 = 1, …, 𝑛.
• They’re the distances between the green circles and red squares.
• When you implement linear regression, you’re actually trying to minimize these distances and make the red squares as close to the
predefined green circles as possible.
linear regression equation
Example
Line of Best Fit
• The Linear Regression model have to find the line of best fit.
• We know the equation of a line is y=mx+c.
• There are infinite m and c possibilities, which one to chose?
• Out of all possible lines, how to find the best fit line?
• The line of best fit is calculated by using the cost function — Least
Sum of Squares of Errors.
• The line of best fit will have the least sum of squares error.
Cost Function
• For all possible lines, calculate the sum of squares of errors. The line
which has the least sum of squares of errors is the best fit line.
The Least Squares Regression Line
• Definition
• Given a collection of pairs (x,y) of numbers (in which not all the x-values are the same), there
is a line that best fits the data in the sense of minimizing the sum of the
squared errors.
• It is called the least squares regression line.
• Its slope and y-intercept are computed using the formulas:
Where
Example
• Find the least squares regression line for the five-point data set:
X 2 2 6 8 10
Y 0 1 2 3 3
• A standard deviation (or σ) is a measure of how dispersed the data is in relation to the mean.
• Low standard deviation means data are clustered around the mean,
• High standard deviation indicates data are more spread out.
The coefficient of determination or R-squared
• The coefficient of determination or R-squared represents the
proportion of the variance in the dependent variable which is
explained by the linear regression model.
• It is a scale-free score i.e. irrespective of the values being small or
large, the value of R square will be less than one.
Continue
• The lower value of MAE, MSE, and RMSE implies higher accuracy of a
regression model.
• However, a higher value of R square is considered desirable.
• Both RMSE and R- Squared quantifies how well a linear regression
model fits a dataset.
• The RMSE tells how well a regression model can predict the value of a
response variable in absolute terms .
• R- Squared tells how well the predictor variables can explain the
variation in the response variable.
Polynomial Regression
• You can regard polynomial regression as a generalized case of linear regression. You assume the polynomial
dependence between the output and inputs and, consequently, the polynomial estimated regression function.
• In other words, in addition to linear terms like 𝑏₁ 𝑥₁, your regression function 𝑓 can include nonlinear terms such as
𝑏₂𝑥₁², 𝑏₃𝑥₁³, or even 𝑏₄𝑥₁𝑥₂, 𝑏₅𝑥₁²𝑥₂.
• The simplest example of polynomial regression has a single independent variable, and the estimated regression
function is a polynomial of degree two: 𝑓(𝑥) = 𝑏₀ + 𝑏₁ 𝑥 + 𝑏₂ 𝑥².
• Now, remember that you want to calculate 𝑏₀, 𝑏₁, and 𝑏₂ to minimize SSR. These are your unknowns!
• Keeping this in mind, compare the previous regression function with the function 𝑓( 𝑥₁, 𝑥₂) = 𝑏₀ + 𝑏₁ 𝑥₁ + 𝑏₂ 𝑥₂, used for
linear regression. They look very similar and are both linear functions of the unknowns 𝑏₀, 𝑏₁, and 𝑏₂. This is why
you can solve the polynomial regression problem as a linear problem with the term 𝑥² regarded as an input
variable.
• In the case of two variables and the polynomial of degree two, the regression function has this form: 𝑓( 𝑥₁, 𝑥₂) = 𝑏₀
+ 𝑏₁𝑥₁ + 𝑏₂𝑥₂ + 𝑏₃𝑥₁² + 𝑏₄𝑥₁𝑥₂ + 𝑏₅𝑥₂².
• The procedure for solving the problem is identical to the previous case. You apply linear regression for five inputs:
𝑥₁, 𝑥₂, 𝑥₁², 𝑥₁𝑥₂, and 𝑥₂². As the result of regression, you get the values of six weights that minimize SSR: 𝑏₀, 𝑏₁, 𝑏₂,
𝑏₃, 𝑏₄, and 𝑏₅.
Polynomial Regression
Polynomial regression is needed when there is no linear
correlation fitting all the variables. So instead of looking like a
line, it looks like a nonlinear function.
Linear vs Polynomial
example of weight loss.
• Underfitting
• occurs when a model can’t accurately capture the dependencies among data, usually as a
consequence of its own simplicity.
• It often yields a low 𝑅² with known data and bad generalization capabilities when applied
with new data.
• Overfitting
• A model learns the existing data too well.
• Complex models, which have many features or terms, are often prone to overfitting.
• When applied to known data, such models usually yield high 𝑅².
• However, they often don’t generalize well and have significantly lower 𝑅² when used with
new data.
• The left plot shows a linear