Ensemble Learning

Ensemble Learning
This method helps in improving machine learning results by combining several models to
predictive accuracy. The primary concept behind ensemble models is to combine weak learners
to form active learners. Bagging and Boosting are two types of ensemble learning.
What is Boosting
Boosting is an ensemble modeling technique that attempts to build a strong
classifier from the number of weak classifiers.
It is done by building a model by using weak models in series.
Firstly, a model is built from the training data.
Then the second model is built which tries to correct the errors present in the
first model.
This procedure is continued and models are added until either the complete
training data set is predicted correctly or the maximum number of models are
added.
Advantages of Boosting
 Improved Accuracy –
 Robustness to Overfitting –
 Better handling of imbalanced data
 Better Interpretability –
Training of Boosting Model
Generally, boosting works as follows:
 Create the initial weak learner.
 Use the weak learner to make predictions on the entire dataset.
 Compute the prediction errors.
 Incorrect predictions are assigned more weight.
 Build another weak learner aimed at fixing the errors of the previous
learner.
 Make predictions on the whole dataset using the new learner.
 Repeat this process until the optimal results are obtained.
 The final model is obtained by weighting the mean of all weak learners.
How would you classify an email as SPAM or not?
Our initial approach would be to identify ‘SPAM’ and ‘NOT SPAM’ emails
using the following criteria. If:
1. Email has only one image file (promotional image), It’s a SPAM.
2. Email has only link(s), It’s a SPAM.
3. Email body consists of sentences like “You won a prize money of $ xxxxxx”, It’s a
SPAM.
4. Email from our official domain “www.knowledgehut.com” , Not a SPAM.
5. Email from known source, Not a SPAM.
Individually, these rules are not powerful enough to classify an email into
‘SPAM’ or ‘NOT SPAM’. Therefore, these rules are called as weak learner.
To convert weak learner to strong learner, we’ll combine the prediction of

each weak learner using methods like:
 Using average/ weighted average

 Considering prediction has higher vote
Example: Above, we have defined 5 weak learners. Out of these 5, 3 are
voted as ‘SPAM’ and 2 are voted as ‘Not a SPAM’. In this case, by default,
we’ll consider an email as SPAM because we have higher(3) vote for
‘SPAM’
boosting.
Training a boosting model
The Explanation for Training the Boosting Model:

The above diagram explains the AdaBoost algorithm in a very simple way. Let’s
try to understand it in a stepwise process:
 B1 consists of 10 data points which consist of two types namely plus(+) and
minus(-) and 5 of which are plus(+) and the other 5 are minus(-) and each
one has been assigned equal weight initially. The first model tries to classify
the data points and generates a vertical separator line but it wrongly
classifies 3 plus(+) as minus(-).
 B2 consists of the 10 data points from the previous model in which the 3
wrongly classified plus(+) are weighted more so that the current model tries
more to classify these pluses(+) correctly. This model generates a vertical
separator line that correctly classifies the previously wrongly classified
pluses(+) but in this attempt, it wrongly classifies three minuses(-).
 B3 consists of the 10 data points from the previous model in which the 3
wrongly classified minus(-) are weighted more so that the current model tries
more to classify these minuses(-) correctly. This model generates a
horizontal separator line that correctly classifies the previously wrongly
classified minuses(-).
 B4 combines together B1, B2, and B3 in order to build a strong prediction
model which is much better than any individual model used.
Bagging
Bagging is a type of ensemble method that involves combining the predictions of several
different models in order to create a more accurate result..
Bagging technique works by creating a number of models, called “bags”, each of which is
based on a different randomly-selected sample of the data. This means that individual data
points can be selected more than once. The predictions made by each bag are then averaged
to produce a more accurate prediction. This helps to reduce the variance of the predictions
and creates a more accurate prediction. According to a study, the use of ensemble methods
can improve the accuracy of a machine learning model by up to 10%. Bagging is also known
as bootstrap aggregation. The diagram below represents bagging ensemble method.
An example of application of the bagging method would be to create a model that predicts
whether or not a customer will churn. This could be done by training several different
models using different data sets and then averaging their predictions together. This would
help to reduce the variance of the predictions and create a more accurate prediction.
Stacking
Stacking is one of the popular ensemble modeling techniques in machine learning.
Various weak learners are ensembled in a parallel manner in such a way that by
combining them with Meta learners, we can predict better predictions for the
future.
Architecture of Stacking
The architecture of the stacking model is designed in such as way that it consists of two
or more base/learner's models and a meta-model that combines the predictions of the
base models. These base models are called level 0 models, and the meta-model is
known as the level 1 model. So, the Stacking ensemble method includes original
(training) data, primary level models, primary level prediction, secondary level
model, and final prediction. The basic architecture of stacking can be represented as
shown below the image.
o Original data: This data is divided into n-folds and is also considered test data or
training data.
o Base models: These models are also referred to as level-0 models. These models
use training data and provide compiled predictions (level-0) as an output.
o Level-0 Predictions: Each base model is triggered on some training data and
provides different predictions, which are known as level-0 predictions.
o Meta Model: The architecture of the stacking model consists of one meta-
model, which helps to best combine the predictions of the base models. The
meta-model is also known as the level-1 model.
o Level-1 Prediction: The meta-model learns how to best combine the predictions
of the base models and is trained on different predictions made by individual
base models, i.e., data not used to train the base models are fed to the meta-
model, predictions are made, and these predictions, along with the expected
outputs, provide the input and output pairs of the training dataset used to fit the
meta-model.
Steps to implement Stacking models:

There are some important steps to implementing stacking models in machine learning.
These are as follows:
o Split training data sets into n-folds using the RepeatedStratifiedKFold as this is
the most common approach to preparing training datasets for meta-models.
o Now the base model is fitted with the first fold, which is n-1, and it will make
predictions for the nth folds.
o The prediction made in the above step is added to the x1_train list.
o Repeat steps 2 & 3 for remaining n-1folds, so it will give x1_train array of size n,
o Now, the model is trained on all the n parts, which will make predictions for the
sample data.
o Add this prediction to the y1_test list.
o In the same way, we can find x2_train, y2_test, x3_train, and y3_test by using
Model 2 and 3 for training, respectively, to get Level 2 predictions.
o Now train the Meta model on level 1 prediction, where these predictions will be
used as features for the model.
o Finally, Meta learners can now be used to make a prediction on test data in the
stacking model.
Voting
The voting ensemble method is a type of ensemble method that combines the predictions of
multiple models by Voting. The voting ensemble method can be used to make more accurate
predictions than any single model by combining the knowledge and expertise of multiple
experts. The idea is that, by pooling the predictions of multiple models, you can reduce the
variance and avoid overfitting. The voting ensemble method is typically used when there are
multiple models with different configurations / algorithms. The following represents
classifier ensemble created using models trained using different machine learning
algorithms such as logistic regression, SVM, random forest and other algorithms.
In either case, the voting ensemble method can help to produce a more accurate prediction
by aggregating the information from multiple sources. The above ensemble classifier
aggregates the predictions of each classifier and predict the class that gets the most votes.
This majority-vote classifier is called a hard voting classifier The picture below represents
the hard voting to make the final prediction:
Here is another picture which can help understand the voting technique in a better manner.
In the picture below, C represents classification models and P represents prediction.
Training data set is used to train different classifications models C1, C2, …, Cm. Then, new
data is passed to each of the classification models to get the predictions. Finally, the
majority voting is used for final prediction.
Boosting vs Bagging
Boosting Bagging
In Boosting we combine predictions that Bagging is a method of combining the

belong to different types same type of prediction
The main aim of boosting is to decrease The main aim of bagging is to decrease
bias, not variance variance not bias
At every successive layer Models are

All the models have the same weightage
weighted according to their performance.
New Models are influenced by the All the models are independent of each
accuracy of previous Models other
Disadvantages of Boosting Algorithms
Boosting algorithms also have some disadvantages these are:

 Boosting Algorithms are vulnerable to the outliers
 It is difficult to use boosting algorithms for Real-Time applications.
 It is computationally expensive for large datasets
Boosting helps in training a series of low performing algorithms, called weak
learners, simply by adjusting the error metric over time. Weak learners are
considered to be those algorithms whose error rate is slightly under 50% as
illustrated below:
This ensemble technique works by applying input of combined multiple weak learners'
predictions and Meta learners so that a better output prediction model can be achieved.
In stacking, an algorithm takes the outputs of sub-models as input and attempts to

learn how to best combine the input predictions to make a better output prediction.
Stacking is also known as a stacked generalization and is an extended form of the

Model Averaging Ensemble technique in which all sub-models equally participate as per
their performance weights and build a new model with better predictions. This new
model is stacked up on top of the others; this is the reason why it is named stacking.

Ensemble Learning

Uploaded by

Copyright:

Available Formats

Ensemble Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ensemble Learning

Uploaded by

Copyright:

Available Formats

Ensemble Learning

To convert weak learner to strong learner, we’ll combine the prediction of

 Using average/ weighted average

Training a boosting model

The Explanation for Training the Boosting Model:

Steps to implement Stacking models:

In Boosting we combine predictions that Bagging is a method of combining the

At every successive layer Models are

Disadvantages of Boosting Algorithms

Boosting algorithms also have some disadvantages these are:

In stacking, an algorithm takes the outputs of sub-models as input and attempts to

Stacking is also known as a stacked generalization and is an extended form of the

You might also like