Simple Linear Regression
Simple Linear Regression
2.
From the regression plots above, we can see that the residuals of the dummy data
are spread across the regression line as they should be to meet the linearity
assumption unlike the residuals of the energy efficiency dataset which are a bit
farther from the regression line.
distribution
The energy efficiency dataset flouts this assumption as the residuals are clearly not
normally distributed while the dummy dataset has normally distributed residuals
with the mean and median at 0.
4. Independence of the observations
In multiple linear regression where there are more predictors, it is assumed that
these variables are independent of each other without any strong correlation
between them.
The energy efficiency dataset shows a strong correlation between relative
compactness and surface area, relative compactness and overall height, surface
area and roof area while the variables in the dummy dataset are seen to be
independent of each other.
Overall, before inferences are drawn from a linear regression model, all the
assumptions discussed above must have been met.
Residual sum of squares and minimizing the cost function
A cost function is a measure of the performance of a model i.e. how far or close the
predicted values are to the real values. The objective is to minimise the cost
function in order for the model to continuously learn to obtain better results. In
linear regression, the cost function can be defined as the sum of squared errors in a
training set. The squares of the residuals are taken to penalise errors farther from
the line of best fit more than those closer to the line and obtain the best parameter
values.
Gradient descent and coordinate descent algorithm
Gradient descent is an optimization algorithm that minimizes a cost function by
specifying the direction to move towards to obtain a local or global minima. This is
done by initially starting with random values then iteratively updating the values
until the minimum cost is obtained. A learning rate is usually chosen to determine
the step size to be taken for each iteration. It is important to carefully select this
parameter because, if a small step is chosen, it will take a long time to converge to
the minimum cost while if too large, it can result in an overshoot surpassing the
location of the minimum cost.
Penalization Methods
Regulating over- and under-fitting
Regularization is a method used to make complex models simpler by penalising
coefficients to reduce their magnitude, variance in the training set and in turn,
reduce overfitting in the model. Regularization occurs by shrinking the coefficients
in the model towards zero such that the complexity term added to the model will
result in a bigger loss for models with a higher complexity . There are two types of
regression techniques such as Ridge and Lasso regression.
Ridge Regression
Also known as L2 Regularisation, this is a technique that uses a penalty term to
shrink the magnitude of coefficients towards zero without eliminating them. The
shrinkage prevents overfitting caused by the complexity of the model or
collinearity. It includes the square magnitude of the coefficients to the loss function
as the penalty term. If the error is defined as the square of residual, when a L2
regularization term is added, the result is the equation below.