Ensemble Techniques and Random Forest: - Linear Algebra. - Basics of Machine Learning
Ensemble Techniques and Random Forest: - Linear Algebra. - Basics of Machine Learning
Ensemble Techniques and Random Forest: - Linear Algebra. - Basics of Machine Learning
Prerequisite-
- Linear algebra.
- Basics of machine learning.
Objectives-
Introduction-
In today's world where machine learning is keep getting better and better predictive model
is the basic building block of machine learning and the good accuracy of the model is the
ultimate goal of a machine learning engineer. This note describes one way or technique to
increase the accuracy of a model known as ensemble technique. For example, when you
ashishmahajan231191@gmail.com
P8NSODJM5T plan to go on a vacation you will not go to some place randomly it would be highly unlikely.
You will probably be going to check some websites, ask some of your friends about their
favorite places and then accordingly take your decision on the basis of other people's
opinions.
Ensemble technique works in the same as described in the above example. They combine
multiple model to increase the overall accuracy or performance. The ensemble of models can
be done in various ways.
To get better understanding Ensemble Learning, let us take an example suppose you are
a cook in a restaurant and have made a new dish and now you want some feedback before
you launch it in your restaurant the different tricks which you can use are:
i. You can go to one of your friends and ask him for the rating.
ii. You can ask 10 colleagues of yours to rate the food.
iii. You can ask 100 people to do the rating.
In the first case your friend might not want to break your heart and gives you a good rating
even if he does not like the dish. In the second case the 10 of colleagues of yours might not
have knowledge about that type of dish for example they are good at making vegetarian food
This file is
Proprietary meant
content. © for personal
Great Learning. use by ashishmahajan231191@gmail.com
All Rights Reserved. Unauthorized use or distribution only.
prohibited.
1
Sharing or publishing the contents in part or full is liable for legal action.
but not good at making non vegetarian food. In third case out of 100 people, some of them
maybe your friend, some may be your relatives and some may be complete unknown
persons. The ratings, in this case would be more diversified and generalized as they have a
variety of skill sets.
With this example, we can understand that diverse group of peoples have higher probability
to make better decisions as compared to individual. The same analogy cab be used with
creating a group of diverse set of models for predictions in comparison to just relying on one.
This type of machine learning modelling method is achieved a technique known as Ensemble
learning. Some simple ensemble learning techniques are:
- Averaging
- Max voting
- Weighted Average
In averaging, we take the prediction from each model and then use the average of these
models will be your final prediction. It is used for making predictions in regression problems
and in classification problems it is used for calculating probability.
ashishmahajan231191@gmail.com
P8NSODJM5T 5 4 5 4 4 4.4
In the above example each model predicts a movie rating and then the average of all
the models is calculated to get an average outcome of 4.4.
5 4 5 4 4 4
In the above example each model predicts a movie rating and mode of all the
prediction is considered as the final prediction which in the above case is 4.
This file is
Proprietary meant
content. © for personal
Great Learning. use by ashishmahajan231191@gmail.com
All Rights Reserved. Unauthorized use or distribution only.
prohibited.
2
Sharing or publishing the contents in part or full is liable for legal action.
Model Model 1 Model 2 Model 3 Model 4 Final rating
Rating 4 4 5 4 4.2
In the above example model 1 and model 2 are given more importance then model 3
and model 4. Final rating is calculated using the weighted average ratings of different models.
- Bagging
- Boosting
Bagging is a technique of merging the outputs of various models to get a final result.
But if we go by this method it will have higher probability that these different models generate
same results as they are implemented by same input data. This problem can be mitigate by
the technique known as bootstrapping.
ashishmahajan231191@gmail.com
P8NSODJM5TBootstrapping
This file is
Proprietary meant
content. © for personal
Great Learning. use by ashishmahajan231191@gmail.com
All Rights Reserved. Unauthorized use or distribution only.
prohibited.
3
Sharing or publishing the contents in part or full is liable for legal action.
we put another red ball again in the box keeping the chance of obtaining another red ball
remains the same.
Stratified Sampling
Random sampling is good method to sample homogeneous data points but in case of
heterogeneous data we need a different method of sampling to increase the precision of
estimator. This method is known as stratified sampling. The basic idea of stratified sampling
is to divide the main heterogeneous data points into smaller samples of subgroups such that
the smaller samples are homogeneous with respect to the data points. These smaller samples
or subgroups of data points are known as strata.
Bagging uses these different sampling techniques to get unbiased data distribution.
Sometimes the size of subset is less than the original dataset. A base model is trained on
each of the subset. The models run parallelly and generate individual predictions. These
models are independent of each other. The final prediction is done by merging prediction from
all the individual models. [image source : https://towardsdatascience.com/ensemble-
methods-in-machine-learning-what-are-they-and-why-use-them-68ec3f9fef5f.]
ashishmahajan231191@gmail.com
P8NSODJM5T
This file is
Proprietary meant
content. © for personal
Great Learning. use by ashishmahajan231191@gmail.com
All Rights Reserved. Unauthorized use or distribution only.
prohibited.
4
Sharing or publishing the contents in part or full is liable for legal action.
Boosting-
Suppose for example if the final prediction is wrong in two of the models and correct
prediction was made in only one of the models. In this case combining the model will not give
the accurate result. This situation was taken care of with the help of another method called
boosting.
Boosting is a linear sequential process, where next or upcoming model tries to
minimize the errors made by previous model in prediction. This method is different from
bagging in the sense where each succeeding model is dependent on the previous model. Let
us understand with the help of an example:
1. Various subset is created using original dataset. At first all the point is given equal weight
and a first model is trained using the subset of data and now this model is used to make
the prediction on the actual dataset.
2. Actual data values with predicted value are used to calculate the error in prediction. The
incorrectly predicted data points are given higher weight and the second model is trained
on new dataset with changed weights. The second model is also trying to reduce the error
made by the previous dataset.
3. Sequentially, various models are trained and each model is minimizing the errors from the
ashishmahajan231191@gmail.com
P8NSODJM5T last model. The final model will be a strong learner is which is the weighted mean of all
the weak learners.
Thus, boosting algorithms merge various weak learners to get one strong learner. The
weak learners generally do not perform well on whole data but are efficient of subset of data.
That's why when we merge them together, each model increases in efficiency and give better
predictions as whole.
This file is
Proprietary meant
content. © for personal
Great Learning. use by ashishmahajan231191@gmail.com
All Rights Reserved. Unauthorized use or distribution only.
prohibited.
5
Sharing or publishing the contents in part or full is liable for legal action.
Random Forest-
In Layman’s term let’s suppose you want to watch a movie. You can watch online, read reviews
on blogs, and you can also ask some friends to help you.
Assuming you ask your friends about their favorite movies. You will get some
recommendations from every friend with the help of these recommendations. You can make a list of
ashishmahajan231191@gmail.com
P8NSODJM5T recommended movies and ask them to vote on the list that you have made. The movie with the highest
number of votes will be the final choice to watch.
In the above example you can clearly see that there are two basic steps. The first one is asking
your friends about their favorite movies and choosing one recommendation out of the many movies
they have watched. This part is just like a decision tree algorithm.
In the second part is the voting part that is selecting the best movie out of the list of
recommended movies. This whole process of getting recommendations from friends and voting on
them to select the best one is known as Random forest algorithm.
Random forest is basically an ensample technique which is based on philosophy of divide and
conquer methodology. It generates small decision trees using random subsamples of the dataset
where the collection of the generated decision tree is defined as forest. Every individual tree is created
using an attribute selection indicator such as gini Index, entropy and information gain etc.
In classification problem voting is done by each tree and most voted class is considered as the
final result whereas in case of regression the average method is used get the final outcome.
This file is
Proprietary meant
content. © for personal
Great Learning. use by ashishmahajan231191@gmail.com
All Rights Reserved. Unauthorized use or distribution only.
prohibited.
6
Sharing or publishing the contents in part or full is liable for legal action.
Basic Step Involved in the algorithm
ashishmahajan231191@gmail.com
P8NSODJM5T
Advantages
Disadvantages
- Random Forest algorithm is very slow compared to others because it calculates prediction for
each decision tree for every sub sample and then votes on them to select the best one which is
time consuming.
This file is
Proprietary meant
content. © for personal
Great Learning. use by ashishmahajan231191@gmail.com
All Rights Reserved. Unauthorized use or distribution only.
prohibited.
7
Sharing or publishing the contents in part or full is liable for legal action.
- It is difficult to explain the model as compared to a decision tree where you can easily make the
decision following the path of the tree.
*********
ashishmahajan231191@gmail.com
P8NSODJM5T
This file is
Proprietary meant
content. © for personal
Great Learning. use by ashishmahajan231191@gmail.com
All Rights Reserved. Unauthorized use or distribution only.
prohibited.
8
Sharing or publishing the contents in part or full is liable for legal action.