Random forest algorithm 1
Random forest algorithm 1
INTRODUCTION
Random Forest is a popular machine learning algorithm that
belongs to the supervised learning technique. It can be used for
both Classification and Regression problems in ML.
A Random Forest is a collection of decision trees that
work together to make predictions
Random Forest is based on ensemble learning (Ensemble
learning combines the predictions of multiple models
(called "weak learners" or "base models") to make a
stronger, more reliable prediction).
A key characteristic is its ability to determine which features are
most important in making predictions, aiding in data analysis and
model interpretation.
The goal is to reduce errors and improve performance.
ALGORITHM:
1. Random Forest builds multiple decision trees
using random samples of the data. Each tree is
trained on a different subset of the data which
makes each tree unique.
2. When creating each tree the algorithm randomly
selects a subset of features or variables to split
the data rather than using all available features
at a time. This adds diversity to the trees.
3. Each decision tree in the forest makes a prediction
based on the data it was trained on
4. When making final prediction random forest
combines the results from all the trees.
• For classification tasks the final prediction is decided by
a majority vote. This means that the category predicted
by most trees is the final prediction.
• For regression tasks the final prediction is the average
of the predictions from all the trees.
5. The randomness in data samples and feature
selection helps to prevent the model from overfitting
making the predictions more accurate and reliable.
ADVANTAGES:
• You can use random forest for classification and
regression problems.
• It solves the problem of overfitting as output is based
on majority voting or averaging.
• It performs well even if the data contains null/missing
values.
• Each decision tree created is independent of the other;
thus, it shows the property of parallelization.
• It maintains high stability by taking the average
answers from a large number of trees.
DISADVANTAGES
• Random forest is more complex than decision trees,
where you can make decisions by following the path of
the tree.
• Training time is more than other models due to its
complexity. Whenever it has to make a prediction,
each decision tree has to generate output for the
given input data.
• Increased computational cost and memory usage,
making it slower for predictions and less suitable
for real-time applications
APPLICATIONS
•Finance:
•Credit Risk Assessment: Predicting the likelihood of
loan defaults.
•Fraud Detection: Identifying fraudulent transactions.
•Stock Price Prediction: Forecasting stock market
trends.
•Healthcare:
•Disease Prediction: Predicting patient outcomes and
modeling disease progression.
•Drug Discovery: Identifying potential drug targets
and predicting drug efficacy.
THANK YOU