Ensemble Learning
Ensemble Learning
Ensemble Learning
This method helps in improving machine learning results by combining several models to
predictive accuracy. The primary concept behind ensemble models is to combine weak learners
to form active learners. Bagging and Boosting are two types of ensemble learning.
What is Boosting
Boosting is an ensemble modeling technique that attempts to build a strong
classifier from the number of weak classifiers.
It is done by building a model by using weak models in series.
Firstly, a model is built from the training data.
Then the second model is built which tries to correct the errors present in the
first model.
This procedure is continued and models are added until either the complete
training data set is predicted correctly or the maximum number of models are
added.
Advantages of Boosting
Improved Accuracy –
Robustness to Overfitting –
Better handling of imbalanced data
Better Interpretability –
Training of Boosting Model
Generally, boosting works as follows:
Create the initial weak learner.
Use the weak learner to make predictions on the entire dataset.
Compute the prediction errors.
Incorrect predictions are assigned more weight.
Build another weak learner aimed at fixing the errors of the previous
learner.
Make predictions on the whole dataset using the new learner.
Repeat this process until the optimal results are obtained.
The final model is obtained by weighting the mean of all weak learners.
How would you classify an email as SPAM or not?
Our initial approach would be to identify ‘SPAM’ and ‘NOT SPAM’ emails
using the following criteria. If:
1. Email has only one image file (promotional image), It’s a SPAM.
2. Email has only link(s), It’s a SPAM.
3. Email body consists of sentences like “You won a prize money of $ xxxxxx”, It’s a
SPAM.
4. Email from our official domain “www.knowledgehut.com” , Not a SPAM.
5. Email from known source, Not a SPAM.
Individually, these rules are not powerful enough to classify an email into
‘SPAM’ or ‘NOT SPAM’. Therefore, these rules are called as weak learner.
Bagging
Bagging is a type of ensemble method that involves combining the predictions of several
different models in order to create a more accurate result..
Bagging technique works by creating a number of models, called “bags”, each of which is
based on a different randomly-selected sample of the data. This means that individual data
points can be selected more than once. The predictions made by each bag are then averaged
to produce a more accurate prediction. This helps to reduce the variance of the predictions
and creates a more accurate prediction. According to a study, the use of ensemble methods
can improve the accuracy of a machine learning model by up to 10%. Bagging is also known
as bootstrap aggregation. The diagram below represents bagging ensemble method.
An example of application of the bagging method would be to create a model that predicts
whether or not a customer will churn. This could be done by training several different
models using different data sets and then averaging their predictions together. This would
help to reduce the variance of the predictions and create a more accurate prediction.
Stacking
Stacking is one of the popular ensemble modeling techniques in machine learning.
Various weak learners are ensembled in a parallel manner in such a way that by
combining them with Meta learners, we can predict better predictions for the
future.
Architecture of Stacking
The architecture of the stacking model is designed in such as way that it consists of two
or more base/learner's models and a meta-model that combines the predictions of the
base models. These base models are called level 0 models, and the meta-model is
known as the level 1 model. So, the Stacking ensemble method includes original
(training) data, primary level models, primary level prediction, secondary level
model, and final prediction. The basic architecture of stacking can be represented as
shown below the image.
o Original data: This data is divided into n-folds and is also considered test data or
training data.
o Base models: These models are also referred to as level-0 models. These models
use training data and provide compiled predictions (level-0) as an output.
o Level-0 Predictions: Each base model is triggered on some training data and
provides different predictions, which are known as level-0 predictions.
o Meta Model: The architecture of the stacking model consists of one meta-
model, which helps to best combine the predictions of the base models. The
meta-model is also known as the level-1 model.
o Level-1 Prediction: The meta-model learns how to best combine the predictions
of the base models and is trained on different predictions made by individual
base models, i.e., data not used to train the base models are fed to the meta-
model, predictions are made, and these predictions, along with the expected
outputs, provide the input and output pairs of the training dataset used to fit the
meta-model.
o Split training data sets into n-folds using the RepeatedStratifiedKFold as this is
the most common approach to preparing training datasets for meta-models.
o Now the base model is fitted with the first fold, which is n-1, and it will make
predictions for the nth folds.
o The prediction made in the above step is added to the x1_train list.
o Repeat steps 2 & 3 for remaining n-1folds, so it will give x1_train array of size n,
o Now, the model is trained on all the n parts, which will make predictions for the
sample data.
o Add this prediction to the y1_test list.
o In the same way, we can find x2_train, y2_test, x3_train, and y3_test by using
Model 2 and 3 for training, respectively, to get Level 2 predictions.
o Now train the Meta model on level 1 prediction, where these predictions will be
used as features for the model.
o Finally, Meta learners can now be used to make a prediction on test data in the
stacking model.
Voting
The voting ensemble method is a type of ensemble method that combines the predictions of
multiple models by Voting. The voting ensemble method can be used to make more accurate
predictions than any single model by combining the knowledge and expertise of multiple
experts. The idea is that, by pooling the predictions of multiple models, you can reduce the
variance and avoid overfitting. The voting ensemble method is typically used when there are
multiple models with different configurations / algorithms. The following represents
classifier ensemble created using models trained using different machine learning
algorithms such as logistic regression, SVM, random forest and other algorithms.
In either case, the voting ensemble method can help to produce a more accurate prediction
by aggregating the information from multiple sources. The above ensemble classifier
aggregates the predictions of each classifier and predict the class that gets the most votes.
This majority-vote classifier is called a hard voting classifier The picture below represents
the hard voting to make the final prediction:
Here is another picture which can help understand the voting technique in a better manner.
In the picture below, C represents classification models and P represents prediction.
Training data set is used to train different classifications models C1, C2, …, Cm. Then, new
data is passed to each of the classification models to get the predictions. Finally, the
majority voting is used for final prediction.
Boosting vs Bagging
Boosting Bagging
The main aim of boosting is to decrease The main aim of bagging is to decrease
bias, not variance variance not bias
This ensemble technique works by applying input of combined multiple weak learners'
predictions and Meta learners so that a better output prediction model can be achieved.