Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Dr. Kapil K.

MisalUnit III- Machine Learning

Unit 3: Regression and Generalization


Regression: Assessing performance of Regression – Error measures, Overfitting and
Underfitting, Catalysts for Overfitting, VC Dimensions Linear Models: Least Square
method, Univariate Regression, Multivariate Linear Regression, Regularized Regression -
Ridge Regression and Lasso Theory of Generalization: Bias and Variance Dilemma, Training
and Testing Curves Case Study of Polynomial Curve Fitting.

1. What is Regression:
Regression is a method for understanding the relationship between independent
variables or features and a dependent variable or outcome. Outcomes can then be
predicted once the relationship between independent and dependent variables has
been estimated. Regression is a field of study in statistics which forms a key part of
forecast models in machine learning. It’s used as an approach to predict continuous
outcomes in predictive modelling, so has utility in forecasting and predicting
outcomes from data. Machine learning regression generally involves plotting a line
of best fit through the data points. The distance between each point and the line is
minimised to achieve the best fit line.
Regression analysis is a statistical method to model the
relationship between a dependent (target) and independent (predictor) variables
with one or more independent variables. More specifically, Regression analysis helps
us to understand how the value of the dependent variable is changing corresponding
to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.

2. Why do we use Regression Analysis?


As mentioned above, Regression analysis helps in the prediction of a continuous
variable. There are various scenarios in the real world where we need some future
predictions such as weather condition, sales prediction, marketing trends, etc., for
such case we need some technology which can make predictions more accurately. So
for such case we need Regression analysis which is a statistical method and used in
Dr. Kapil K. MisalUnit III- Machine Learning

machine learning and data science. Below are some other reasons for using
Regression analysis:
Regression estimates the relationship between the target and the independent
variable.
It is used to find the trends in data.
It helps to predict real/continuous values.
By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.

3. Types of Regression

There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all the
regression methods analyse the effect of the independent variable on dependent variables.
Here we are discussing some important types of regression which are given below:

 Linear Regression
 Logistic Regression
 Polynomial Regression
 Support Vector Regression
 Decision Tree Regression
 Random Forest Regression
 Ridge Regression
 Lasso Regression:
Dr. Kapil K. MisalUnit III- Machine Learning

4. Assessing performance of Regression – Error measures


Error addresses exactly this and summarizes on average how close
predictions were to their expected values.
There are three error metrics that are commonly used for evaluating and
reporting the performance of a regression model; they are:
Mean Squared Error (MSE).
Root Mean Squared Error (RMSE).
Mean Absolute Error (MAE)

Mean Absolute Error(MAE)


MAE is a very simple metric which calculates the absolute difference
between actual and predicted values.
To better understand, let’s take an example you have input data and output
data and use Linear Regression, which draws a best-fit line.
Now you have to find the MAE of your model which is basically a mistake
made by the model known as an error. Now find the difference between the
actual value and predicted value that is an absolute error but we have to find
the mean absolute of the complete dataset. so, sum all the errors and divide
Dr. Kapil K. MisalUnit III- Machine Learning

them by a total number of observations And this is MAE. And we aim to get a
minimum MAE because this is a loss.

Mean Squared Error(MSE)


MSE is a most used and very simple metric with a little bit of change in mean
absolute error. Mean squared error states that finding the squared difference
between actual and predicted value.
So, above we are finding the absolute difference and here we are finding the
squared difference.
What actually the MSE represents? It represents the squared distance
between actual and predicted values. we perform squared to avoid the
cancellation of negative terms and it is the benefit of MSE.

Root Mean Squared Error(RMSE)


Dr. Kapil K. MisalUnit III- Machine Learning

As RMSE is clear by the name itself, that it is a simple square root of mean
squared error.

R2 Score
The R2 score (pronounced R-Squared Score) is a statistical measure that tells
us how well our model is making all its predictions on a scale of zero to one.
As mentioned above, it's not ideal for a model to predict the actual values in
a regression problem (as opposed to a classification problem that has
discrete levels of value).
But we can use the R2 score to determine the accuracy of our model in terms
of distance or residual. You can calculate the R2 score using the formula
below:
Dr. Kapil K. MisalUnit III- Machine Learning

Linear Regression
Linear regression is a type of supervised machine learning algorithm that
computes the linear relationship between a dependent variable and one or
more independent features. When the number of the independent feature, is
1 then it is known as Univariate Linear regression, and in the case of more
than one feature, it is known as multivariate linear regression. The goal of the
algorithm is to find the best linear equation that can predict the value of the
dependent variable based on the independent variables. The equation
provides a straight line that represents the relationship between the
dependent and independent variables. The slope of the line indicates how
much the dependent variable changes for a unit change in the independent
variable(s).

Assumption for Linear Regression Model


Linear regression is a powerful tool for understanding and predicting the
behaviour of a variable, however, it needs to meet a few conditions in order
to be accurate and dependable solutions.
Dr. Kapil K. MisalUnit III- Machine Learning

 Linearity: The independent and dependent variables have a linear


relationship with one another. This implies that changes in the
dependent variable follow those in the independent variable(s) in a
linear fashion.
 Independence: The observations in the dataset are independent of
each other. This means that the value of the dependent variable for
one observation does not depend on the value of the dependent
variable for another observation.
 Homoscedasticity: Across all levels of the independent variable(s), the
variance of the errors is constant. This indicates that the amount of
the independent variable(s) has no impact on the variance of the
errors.
 Normality: The errors in the model are normally distributed.
 No multicollinearity: There is no high correlation between the
independent variables. This indicates that there is little or no
correlation between the independent variables.

Overfitting and Underfitting


Overfitting and Underfitting are the two main problems that occur in
machine learning and degrade the performance of the machine learning
models.
The main goal of each machine learning model is to generalize well. Here
generalization defines the ability of an ML model to provide a suitable output
by adapting the given set of unknown input. It means after providing training
on the dataset, it can produce reliable and accurate output. Hence, the
underfitting and overfitting are the two terms that need to be checked for
the performance of the model and whether the model is generalizing well or
not.
Before understanding the overfitting and underfitting, let's understand some
basic term that will help to understand this topic well:
Dr. Kapil K. MisalUnit III- Machine Learning

Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.
Noise: Noise is unnecessary and irrelevant data that reduces the performance
of the model.
Bias: Bias is a prediction error that is introduced in the model due to
oversimplifying the machine learning algorithms. Or it is the difference
between the predicted values and the actual values.
Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance
occurs.

Overfitting
Overfitting occurs when our machine learning model tries to cover all the
data points or more than the required data points present in the given
dataset. Because of this, the model starts caching noise and inaccurate values
present in the dataset, and all these factors reduce the efficiency and
accuracy of the model. The overfitted model has low bias and high variance.
The chances of occurrence of overfitting increase as much we provide
training to our model. It means the more we train our model, the more
chances of occurring the overfitted model. Overfitting is the main problem
that occurs in supervised learning.
Example: The concept of the overfitting can be understood by the below
graph of the linear regression output:
Dr. Kapil K. MisalUnit III- Machine Learning

Overfitting and Underfitting in Machine Learning: As we can see from the


above graph, the model tries to cover all the data points present in the
scatter plot. It may look efficient, but in reality, it is not so. Because the goal
of the regression model to find the best fit line, but here we have not got any
best fit, so, it will generate the prediction errors.
How to avoid the Overfitting in Model
Both overfitting and underfitting cause the degraded performance of the
machine learning model. But the main cause is overfitting, so there are some
ways by which we can reduce the occurrence of overfitting in our model.
 Cross-Validation
 Training with more data
 Removing features
 Early stopping the training
 Regularization
 Ensembling

Underfitting
Underfitting occurs when our machine learning model is not able to capture
the underlying trend of the data. To avoid the overfitting in the model, the
fed of training data can be stopped at an early stage, due to which the model
may not learn enough from the training data. As a result, it may fail to find
Dr. Kapil K. MisalUnit III- Machine Learning

the best fit of the dominant trend in the data. In the case of underfitting, the
model is not able to learn enough from the training data, and hence it
reduces the accuracy and produces unreliable predictions.
An underfitted model has high bias and low variance.
Example: We can understand the underfitting using below output of the
linear regression model:

As we can see from the above diagram, the model is unable to


capture the data points present in the plot.
How to avoid underfitting:
 Increase model complexity.
 Increase the number of features, performing feature engineering.
 Remove noise from the data.
 Increase the number of epochs or increase the duration of training to
get better results.

Univariate, Bivariate and Multivariate data and its


analysis
1. Univariate data –
This type of data consists of only one variable. The analysis of univariate data
is thus the simplest form of analysis since the information deals with only one
Dr. Kapil K. MisalUnit III- Machine Learning

quantity that changes. It does not deal with causes or relationships and the
main purpose of the analysis is to describe the data and find patterns that
exist within it. The example of a univariate data can be height.

Suppose that the heights of seven students of a class is recorded(figure),


there is only one variable that is height and it is not dealing with any cause or
relationship. The description of patterns found in this type of data can be
made by drawing conclusions using central tendency measures (mean,
median and mode), dispersion or spread of data (range, minimum, maximum,
quartiles, variance and standard deviation) and by using frequency
distribution tables, histograms, pie charts, frequency polygon and bar charts.
2. Bivariate data
This type of data involves two different variables. The analysis of this type of
data deals with causes and relationships and the analysis is done to find out
the relationship among the two variables. Example of bivariate data can be
temperature and ice cream sales in summer season.

Suppose the temperature and ice cream sales are the two variables of a
bivariate data(figure). Here, the relationship is visible from the table that
temperature and sales are directly proportional to each other and thus
related because as the temperature increases, the sales also increase. Thus
bivariate data analysis involves comparisons, relationships, causes and
explanations. These variables are often plotted on X and Y axis on the graph
for better understanding of data and one of these variables is independent
while the other is dependent.
3. Multivariate data
Dr. Kapil K. MisalUnit III- Machine Learning

When the data involves three or more variables, it is categorized under


multivariate. Example of this type of data is suppose an advertiser wants to
compare the popularity of four advertisements on a website, then their click
rates could be measured for both men and women and relationships
between variables can then be examined. It is similar to bivariate but
contains more than one dependent variable. The ways to perform analysis on
this data depends on the goals to be achieved. Some of the techniques are
regression analysis, path analysis, factor analysis and multivariate analysis of
variance (MANOVA).
There are a lots of different tools, techniques and methods that can be used
to conduct your analysis. You could use software libraries, visualization tools
and statistic testing methods. However, this blog we will be compare
Univariate, Bivariate and Multivariate analysis.

Univariate Bivariate Multivariate

It only summarize
It only summarize two It only summarize more than 2
single variable at
variables variables.
a time.

It does deal with


It does not deal It does not deal with causes
causes and
with causes and and relationships and analysis
relationships and
relationships. is done.
analysis is done.

It does not
It does contain only It is similar to bivariate but it
contain any
one dependent contains more than 2
dependent
variable. variables.
variable.

The main purpose The main purpose is The main purpose is to study
is to describe. to explain. the relationship among them.

The example of a The example of Example, Suppose an


univariate can be bivariate can be advertiser wants to compare
height. temperature and ice the popularity of four
sales in summer advertisements on a website.
vacation.
Then their click rates could be
Dr. Kapil K. MisalUnit III- Machine Learning

measured for both men and


women and relationships
between variable can be
examined

What is Regularization?
Regularization is one of the most important concepts of machine learning. It
is a technique to prevent the model from overfitting by adding extra
information to it.
Sometimes the machine learning model performs well with the training data
but does not perform well with the test data. It means the model is not able
to predict the output when deals with unseen data by introducing noise in
the output, and hence the model is called overfitted. This problem can be
deal with the help of a regularization technique.
This technique can be used in such a way that it will allow to maintain all
variables or features in the model by reducing the magnitude of the
variables. Hence, it maintains accuracy as well as a generalization of the
model.
It mainly regularizes or reduces the coefficient of features toward zero. In
simple words, "In regularization technique, we reduce the magnitude of the
features by keeping the same number of features."
Ridge Regression: Ridge regression is one of the types of linear regression in
which a small amount of bias is introduced so that we can get better long-
term predictions. Ridge regression is a regularization technique, which is used
to reduce the complexity of the model. It is also called as L2 regularization. In
this technique, the cost function is altered by adding the penalty term to it.
The amount of bias added to the model is called Ridge Regression penalty.
We can calculate it by multiplying with the lambda to the squared weight of
each individual feature.
Lasso Regression: Lasso regression is another regularization technique to
reduce the complexity of the model. It stands for Least Absolute and
Selection Operator. It is similar to the Ridge Regression except that the
Dr. Kapil K. MisalUnit III- Machine Learning

penalty term contains only the absolute weights instead of a square of


weights. Since it takes absolute values, hence, it can shrink the slope to 0,
whereas Ridge Regression can only shrink it near to 0. It is also called as L1
regularization.

Key Difference between Ridge Regression and Lasso Regression


Ridge regression is mostly used to reduce the overfitting in the model, and it
includes all the features present in the model. It reduces the complexity of
the model by shrinking the coefficients.
Lasso regression helps to reduce the overfitting in the model as well as
feature selection.

Bias-variance dilemma
The Bias-Variance dilemma is relevant for supervised machine learning. It’s a
way to diagnose an algorithm performance by breaking down its prediction
error. There are three types of prediction errors: bias, variance, and
irreducible error. Machine learning is a branch of Artificial Intelligence, which
allows machines to perform data analysis and make predictions. However, if
the machine learning model is not accurate, it can make predictions errors,
and these prediction errors are usually known as Bias and Variance. In
machine learning, these errors will always be present as there is always a
slight difference between the model predictions and actual predictions. The
main aim of ML/data science analysts is to reduce these errors in order to get
more accurate results.
Errors in Machine Learning?
In machine learning, an error is a measure of how accurately an algorithm
can make predictions for the previously unknown dataset. On the basis of
these errors, the machine learning model is selected that can perform best
on the particular dataset. There are mainly two types of errors in machine
learning, which are:
Dr. Kapil K. MisalUnit III- Machine Learning

Reducible errors: These errors can be reduced to improve the model


accuracy. Such errors can further be classified into bias and Variance.

reducible errors: These errors will always be present in the model


regardless of which algorithm has been used. The cause of these errors is
unknown variables whose value can't be reduced.
What is Bias?
In general, a machine learning model analyses the data, find patterns in it and
make predictions. While training, the model learns these patterns in the
dataset and applies them to test data for prediction. While making
predictions, a difference occurs between prediction values made by the
model and actual values/expected values, and this difference is known as
bias errors or Errors due to bias. It can be defined as an inability of machine
learning algorithms such as Linear Regression to capture the true relationship
between the data points. Each algorithm begins with some amount of bias
because bias occurs from assumptions in the model, which makes the target
function simple to learn.
Low Bias: A low bias model will make fewer assumptions about the form of
the target function.
Dr. Kapil K. MisalUnit III- Machine Learning

High Bias: A model with a high bias makes more assumptions, and the model
becomes unable to capture the important features of our dataset. A high bias
model also cannot perform well on new data.
Generally, a linear algorithm has a high bias, as it makes them learn fast. The
simpler the algorithm, the higher the bias it has likely to be introduced.
Whereas a nonlinear algorithm often has low bias.
Some examples of machine learning algorithms with low bias are Decision
Trees, k-Nearest Neighbours and Support Vector Machines. At the same time,
an algorithm with high bias is Linear Regression, Linear Discriminant Analysis
and Logistic Regression.
Ways to reduce High Bias:
High bias mainly occurs due to a much simple model. Below are some ways
to reduce the high bias:
 Increase the input features as the model is underfitted.
 Decrease the regularization term.
 Use more complex models, such as including some polynomial
features.

What is a Variance Error?


The variance would specify the amount of variation in the prediction if the
different training data was used. In simple words, variance tells that how
much a random variable is different from its expected value. Ideally, a model
should not vary too much from one training dataset to another, which means
the algorithm should be good in understanding the hidden mapping between
inputs and output variables. Variance errors are either of low variance or high
variance.
Low variance means there is a small variation in the prediction of the target
function with changes in the training data set. At the same time, High
variance shows a large variation in the prediction of the target function with
changes in the training dataset.
A model that shows high variance learns a lot and perform well with the
training dataset, and does not generalize well with the unseen dataset. As a
Dr. Kapil K. MisalUnit III- Machine Learning

result, such a model gives good results with the training dataset but shows
high error rates on the test dataset.
Since, with high variance, the model learns too much from the dataset, it
leads to overfitting of the model. A model with high variance has the below
problems:
 A high variance model leads to overfitting.
 Increase model complexities.
Usually, nonlinear algorithms have a lot of flexibility to fit the model, have
high variance.

Some examples of machine learning algorithms with low variance are, Linear
Regression, Logistic Regression, and Linear discriminant analysis. At the same
time, algorithms with high variance are decision tree, Support Vector
Machine, and K-nearest neighbours.
Ways to Reduce High Variance:
 Reduce the input features or number of parameters as a model is
overfitted.
 Do not use a much complex model.
 Increase the training data.
 Increase the Regularization term.
Different Combinations of Bias-Variance
There are four possible combinations of bias and variances, which are
represented by the below diagram:
Dr. Kapil K. MisalUnit III- Machine Learning

 Low-Bias, Low-Variance: The combination of low bias and low variance shows
an ideal machine learning model. However, it is not possible practically.
 Low-Bias, High-Variance: With low bias and high variance, model
predictions are inconsistent and accurate on average. This case occurs
when the model learns with a large number of parameters and hence
leads to an overfitting
 High-Bias, Low-Variance: With High bias and low variance, predictions
are consistent but inaccurate on average. This case occurs when a
model does not learn well with the training dataset or uses few
numbers of the parameter. It leads to underfitting problems in the
model.
 High-Bias, High-Variance: With high bias and high variance, predictions are
inconsistent and also inaccurate on average.

Bias-Variance Trade-Off

 While building the machine learning model, it is really important to


take care of bias and variance in order to avoid overfitting and
underfitting in the model. If the model is very simple with fewer
parameters, it may have low variance and high bias. Whereas, if the
Dr. Kapil K. MisalUnit III- Machine Learning

model has a large number of parameters, it will have high variance


and low bias. So, it is required to make a balance between bias and
variance errors, and this balance between the bias error and variance
error is known as the Bias-Variance trade-off.

For an accurate prediction of the model, algorithms need a low


variance and low bias. But this is not possible because bias and
variance are related to each other:
 If we decrease the variance, it will increase the bias.
 If we decrease the bias, it will increase the variance.
Bias-Variance trade-off is a central issue in supervised learning.
Ideally, we need a model that accurately captures the regularities in
training data and simultaneously generalizes well with the unseen
dataset. Unfortunately, doing this is not possible simultaneously.
Because a high variance algorithm may perform well with training
data, but it may lead to overfitting to noisy data. Whereas, high bias
algorithm generates a much simple model that may not even capture
important regularities in the data. So, we need to find a sweet spot
between bias and variance to make an optimal model.

You might also like