Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Flood Prediction Using Ensemble Machine Learning Model

This research paper presents a comparative study of various machine learning models for flood prediction in India, focusing on the effectiveness of Ensemble Machine Learning techniques. The study utilizes a dataset of rainfall data from Kerala, achieving an accuracy of 93.3% with the Stacked Generalization model, which outperforms other models evaluated. The findings highlight the potential of machine learning in providing timely flood predictions to aid disaster management efforts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Flood Prediction Using Ensemble Machine Learning Model

This research paper presents a comparative study of various machine learning models for flood prediction in India, focusing on the effectiveness of Ensemble Machine Learning techniques. The study utilizes a dataset of rainfall data from Kerala, achieving an accuracy of 93.3% with the Stacked Generalization model, which outperforms other models evaluated. The findings highlight the potential of machine learning in providing timely flood predictions to aid disaster management efforts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Flood Prediction Using Ensemble Machine

Learning Model
Tanvir Rahman Miah Mohammad Asif Syeed Maisha Farzana
Department of Computer and Information Sciences Department of Computer Science and Department of Computer Science and
2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA) | 979-8-3503-3752-5/23/$31.00 ©2023 IEEE | DOI: 10.1109/HORA58378.2023.10156673

University of Delaware Engineering Engineering


Newark, Delaware Brac University Brac University
rtanvir@udel.edu Dhaka, Bangladesh Dhaka, Bangladesh
miah.mohammad.asif.syeed@g.bracu.ac.bd maisha.farzana1@g.bracu.ac.bd

Ishadie Namir Ipshita Ishrar Meherin Hossain Nushra


Department of Computer Science and Engineering Department of Computer Department of Computer
BRAC University Science and Engineering Science and Engineering
Dhaka, Bangladesh BRAC University BRAC University
ishadie.namir@g.bracu.ac.bd Dhaka, Bangladesh Dhaka, Bangladesh
ipshita.ishrar@g.bracu.ac.bd meherin.hossain.nushra@g.bracu.ac.bd

Bhoktear Mahbub Khan


Department of Department of Geography &
Spatial Sciences
University of Delaware
Newark, Delaware
bhoktear@udel.edu

input water volume and evaporation rate, which can occur


Abstract— India experiences recurrent natural disasters seasonally.
in the form of floods, which result in substantial While floods often take several hours or days to develop,
destruction of both human life and property. Accurately providing residents with adequate time to prepare or
predicting the onset and progression of floods in evacuate, they are not always caused by excessive rainfall.
real-time is crucial for minimizing their impact. This In some cases, flooding can be triggered by a storm surge
research paper focuses on a comparative study of from a tropical cyclone, a tsunami, or high tide, which can
various machine learning models for flood prediction in lead to inundation in coastal regions, especially when rivers
India. The evaluated models include K-Nearest Neighbor are already at high levels. Additionally, dam failures, caused
(KNN), Support Vector Classifier (SVC), Decision tree by events such as earthquakes, can result in the flooding of
Classifier, Binary Logistic Regression, and Stacked typically dry areas. [1] Floods have the potential to cause
Generalization (Stacking). We used a dataset of rainfall massive devastation. The devastating impact of floods is
to train and test the models. Our results indicate that the evident from events such as the severe flooding that
stacked generalization model outperforms the other occurred in Bangladesh in July 2007, which destroyed over
models, achieving an accuracy of 93.3% and Standard a million homes. [2] Floodwaters can cause extensive
Deviation of 0.098. Our findings suggest that machine damage beyond structural damage when they recede.
learning models can provide accurate and timely flood According to a survey, approximately 4.84 million people in
predictions, enabling disaster management authorities to India, 3.84 million in Bangladesh, and 3.28 million in China
take appropriate measures to minimize damage and save are at risk of flooding each year [3]. Urban areas in several
lives. other countries are also vulnerable to flooding. Regions with
elevations less than 10 meters above sea level, such as the
Keywords—Flood Prediction, Ensemble Machine Learning, Netherlands, Monaco, and Bahrain, are also at risk of
Rainfall, Support Vector Classifier(SVC), K-Nearest flooding.
Neighbor(KNN), Decision Tree Classifier(DTC), Binary Logistic
The World Resource Institute predicts that by the end of
Regression, Stacked Generalization.
2030, floods will impact more than 147 million individuals
worldwide, resulting in annual damages to urban properties
that could range from $174 billion to $712 billion. [4].
I. OVERVIEW OF THE STUDY [5] Despite the consistency in the effects of natural
disasters like floods year after year, recovery procedures
One of the most destructive and frequent natural disasters cannot be taken for granted. It is essential to forecast river
in the world are floods. When water overflows and covers water levels after heavy rainfall to ensure public safety,
typically dry land, it results in a flood. This can be caused by address environmental concerns, and manage water
an overflow of water from a lake, river, or ocean that resources effectively. Various mathematical models based on
submerges nearby land. The water levels in lakes and rivers physical considerations or statistical analysis have been
are subject to significant fluctuations due to variations in the developed for this purpose. However, both approaches are

979-8-3503-3752-5/23/$31.00 ©2023 IEEE


Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on October 04,2024 at 15:36:17 UTC from IEEE Xplore. Restrictions apply.
complex, and there is a need to continually revise and model gives the most accurate predictions of flood
improve the forecast methods to make them more efficient. occurrence.
So, the contributions of our paper are :
● We proposed a system that can predict the
occurrence of floods with greater accuracy.
● This is accomplished through machine learning III. DIAGRAMMATIC REPRESENTATION OF FLOOD PREDICTION
algorithms, which require less time to predict the SYSTEM
occurrence of a flood.
● This ensures an evacuation time for a specific Our proposed system is illustrated in Figure 1, which
region and sends warning signals to nearby danger includes a flowchart diagram.
areas.
● As a result, we will be able to save more lives in a
flood disaster and assist people in retrieving their
valuable possessions prior to the flood.
The paper is organized as follows: In Section II, we
provide a comparative perspective on previous flood
prediction research and highlight the distinctive features of
our work. Section III presents a detailed discussion of the
proposed system, followed by a flowchart outlining the
system's workflow in Section IV. The dataset and its
preprocessing are described in Section V. In Section VI, we
discuss the machine learning models used on the dataset,
along with the model used for performance comparison.
Section VII evaluates the performance of our proposed
system and identifies the model with the most accurate
result. Lastly, Section VIII concludes the paper by
discussing potential areas for future development of the
proposed system to enhance its efficiency.

II. PROPOSED SYSTEM


The main objective of this study is to enhance the
Fig 1. Flowchart of the propose
accuracy of flood prediction through the use of Ensemble
The following segment (section V), will describe the dataset
Machine Learning techniques. The dataset used in this
and the steps carried out for preprocessing the dataset.
research contains monthly rainfall data for Kerala, a state
located in southwestern India, spanning from 1907 to 2017.
This dataset is publicly available under the National Data
Sharing And Accessibility Policy (NDSAP). We have
chosen this dataset since the climate conditions of Kerala are IV. DATASET DESCRIPTION AND PREPROCESSING
comparable to those of Bangladesh. The following steps
need to be carried out to get a prediction- The Kerala dataset, which comprises monthly and annual
1. Input data: The initial phase of the project involves rainfall indices from 1907 to 2017, is a commonly used
organizing and formatting the dataset comprising the rainfall dataset. This dataset also contains information about the
records of Kerala from 1907 to 2017, to prepare it for occurrence of floods in each corresponding year. Kerala is a
pre-processing before feeding it into the system. state situated in the southern region of India.
2. Preprocessing: This step involves categorizing the dataset Following is an image, figure 2, of the raw dataset:
based on multiple parameters and applying feature encoding
and engineering techniques to preprocess the dataset before
further analysis.
3. Dividing Dataset: The typical ratio for partitioning a
dataset is 80:20.
4. Model Application: Once the dataset is divided into
testing and training sets, we proceed to apply various
machine learning models including K-Nearest Neighbor
(KNN), Support Vector Classifier (SVC), Decision Tree
Classifier, and Binary Logistic Regression. In addition, to
improve the accuracy of the predictions, we will also utilize Fig 2. Raw dataset
the Stacked Generalization (Stacking) technique, which is a
form of Ensemble Machine Learning.
5. Decision Function: After comparing the predicted value The rainfall index is based on the weather data collected and
with the other models, it can be concluded as to which maintained by the Indian Meteorological Department And
Ministry of Earth Sciences. The rainfall index in the Kerala

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on October 04,2024 at 15:36:17 UTC from IEEE Xplore. Restrictions apply.
dataset represents the amount of precipitation received in a
particular area during a specific time frame, relative to the
long-term average. This dataset also includes categorical Feature Engineering:
values such as 'True' and 'False' to indicate whether a flood To ensure that the dataset is unbiased and suitable for use
occurred in a given year. A range of +/- 19% of the average of with the models, the Standard Scaler technique was applied.
2039.6mm (during the period of June to September) is This technique involves scaling the data by centering them
considered a normal monsoon for the state. Any annual rainfall around their mean and scaling them to have unit variance.
that exceeds this threshold is considered as a potential To ensure unbiased training data, the dataset used in this
indicator of flood occurrence. The dataset preprocessing phase study was divided into training and testing sets in an 80:20
comprises three main stages, which are as follows: ratio, and the features were standardized using the Standard
Normalization: Scaler technique. This technique guarantees that all features
To prepare the dataset for analysis, normalization also are on the same scale and that the models are trained on an
known as feature encoding techniques are utilized. In this unbiased dataset.
study, two attributes of the dataset contain string-type data:
'sub-division' and 'flood'. These attributes are encoded V. DESCRIPTION OF MACHINE LEARNING MODELS USED IN
accordingly. For 'flood', which has only two unique values THIS STUDY

('True' and 'False'), binary encoding is applied to represent


these values by replacing 'True' with 1 and 'False' with 0. For predicting the values over the training set, Stacked
Similarly, binary encoding is used to represent the values of Generalization has been used. A Stacked Generalization or
the 'sub-division' feature. As 'sub-division' has only one Stacking model is a part of ensemble machine learning.
unique value ('Kerala'), the entire feature is replaced with 1. This is an architecture consisting of two or more
Figure 3, shows the dataset after feature encoding. base-models and a meta-model for final classification. For
this research we’ve used the following models as base
models and meta model:
● Level-0 models, also referred to as base models,
were used to train the dataset and generate
predictions using individual algorithms. The study
utilized Support Vector Classifier (SVC),
K-Nearest Neighbor (KNN) algorithm, and
Decision Tree Classifier (DTC) as the base models.
● a meta-model, also known as a level-1 model, was
Fig 3. Dataset after feature encoding
used to combine the predictions of the base models
and determine the final classification. Binary
Feature Selection:
Logistic Regression was employed as the
From here on, the sub-division attribute of the dataset is
meta-model to evaluate the overall accuracy of the
dropped, since it didn’t have any direct contribution to the
dataset.
prediction of flood. According to [6], June to October is the
monsoon in Kerala. The study first identified the rainy
season months in Kerala and then applied the Logistic
Regression algorithm, along with SVC, KNN, and Decision
Tree to the selected features. To assess the models'
performance with different feature sets, the entire feature set
of the dataset was used for training. The goal was to
evaluate the models' performance under various feature sets.
This approach provides a baseline for comparison with
models trained on subsets of the features, allowing for a
more comprehensive analysis of the effect of feature
selection on model accuracy.

Figure 4, shows the dataset after feature selection: Fig 5. Diagram of Stacked Generalization [15]

Support Vector Classifier:


The classification algorithm known as SVC is designed to
create a binary and non-probabilistic linear classifier, as per
the research cited in [17]. It accomplishes this by separating
classes from one another using a gap or spatial line, and
predicting the class of new data based on the distance of the
data from the line. This process approximates the Structural
Risk Minimization Principle.
The accuracy of predictions made by the SVC algorithm has
garnered significant attention from researchers, as stated in
Fig 4. Dataset after feature selection
[8]. This algorithm is based on the concept of using linear

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on October 04,2024 at 15:36:17 UTC from IEEE Xplore. Restrictions apply.
regression to identify a decision function for a given sample values of the independent variables. Unlike linear
x, which is given by, regression, a logistic regression uses a logit function to
establish the relationship between the independent and
∑ 𝑦𝑖α𝑖𝐾(𝑥𝑖, 𝑥) + 𝑏 dependent variables. The logit function is the natural
𝑖ϵ𝑆𝑉
logarithm of the odds ratio, which is calculated as the
probability of success divided by the probability of failure.
The dual coefficients in the SVC algorithm are denoted as The logistic regression equation can be expressed in a
α𝑖 and they are bounded by a value C. The kernel 𝐾(𝑥𝑖, 𝑥) general form.
represents the similarity measure between the input vector x
and a training example 𝑥𝑖 and the independent term b is logit(p) = β0 + β1x1 + β2x2 + ... + βnxn
estimated during the training process. [8] Here, p is the probability of the dependent variable being 1,
x1, x2, ..., xn are the independent variables, and β0, β1, β2,
K-Nearest Neighbor: ..., βn are the coefficients of the model. A comparison
The KNN method is a non-parametric algorithm that between linear regression and logistic regression is
predicts values of new data points for regression predictive graphically illustrated below.
problems by relying on feature similarity. According to a
research paper [16], this algorithm assumes that the new
data point is similar to the existing data and assigns it to the
category that is most similar to the existing categories. It can
quickly classify new data into a well-defined category. By
relying on feature similarity, the KNN method can quickly
classify new data points into a well-defined category by
calculating the distance between the new data point and all
previous data points in the training set using various
distance functions such as Euclidean distance.

𝑚
2
Euclidean distance: d(x,y) = ∑ (𝑥𝑖 − 𝑦𝑖)
𝑖=1 Fig 6. Distinguishing Between Linear and Logistic Regression [11]

Decision Tree Classifier: The base equation for the logistic regression model is
By using decision trees, one can generate models that derived as follows:
predict target variables by learning simple decision rules The value of probability "p" is limited to a range between 0
from the features of the dataset. These trees provide a and 1. To determine the odds of "p", it is divided by the
graphical representation of the decisions made by predictive difference between 1 and "p", which is denoted as (1-"p").
models, where internal nodes correspond to tests on the Then, we take the logarithm of this ratio to obtain the log
𝑃
features, branches indicate the outcomes, and leaf nodes [9] odds or logit, which is denoted as 𝑙𝑜𝑔( 1 − 𝑃 ). The logistic
represent the final decision obtained from the feature function is then applied to the logit to obtain the final
computations. The key aspect of decision trees is to create a equation:
series of splits that divide the data into two groups that are
the most homogeneous. To determine group homogeneity, 1
decision trees calculate the entropies of these groups. The P= 𝑃
−𝑙𝑜𝑔( 1 − 𝑃 )
entropy of a decision tree with C classes can be defined as: 1+𝑒
𝐶
Entropy = ∑− 𝑝𝑖𝑙𝑜𝑔2𝑝𝑖 The equation above is used to determine the probability of a
𝑖 binary outcome (either 0 or 1) by analyzing a set of
Information gain is a key statistical property used in predictor variables. Logistic regression models help estimate
decision trees to measure the reduction in entropy. The the relationship between the predictor variables and the
information gain is determined by subtracting the entropy of probability of the binary outcome.
the dataset before splitting based on a specified feature value In the given dataset, the amount of annual rainfall serves as
from the entropy of the dataset after splitting based on that the independent variable, while the dependent variable
feature value. Entropy values range between 0 and 1, where indicates the occurrence of a flood based on the amount of
1 represents maximum group impurity and 0 indicates fully rainfall.
pure groups. The probability of selecting an element of class A meta-model is used to generate an accurate final output by
i at random is represented by 𝑝𝑖. combining the predictions of multiple base-models. The
meta-model is trained on the predictions made by the
base-models on the unused training data. After training, the
Binary Logistic Regression:
meta-model uses the predictions made by the base-models
According to research published in [11], a logistic
on new data to produce a final prediction by combining
regression is a type of generalized linear model used when
them with its own outputs.
the dependent variable is binary. This model estimates the
likelihood of the dependent variable being 1 based on the

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on October 04,2024 at 15:36:17 UTC from IEEE Xplore. Restrictions apply.
To improve the accuracy of the final model, the predictions
Decision Tree Classifier 78.3% 0.217
made by multiple base models are combined using a
(DTC)
meta-model in a process known as stacking. First, the base
models are validated on a portion of the training data. Then,
Binary Logistic 83.6% 0.164
the predictions made by the base models on the remaining
Regression
training data are used to train the meta-model, which learns
to combine the predictions in a way that generates accurate
Stacked Generalization 84.8% 0.159
predictions on new data. Stacking can identify the
predictions that have low correlation between the base
models and combine their strengths to improve the overall
accuracy of the final model. 2. The subsequent table shows the results of applying the
models to rainfall data from all months of the year, using the
VI. PERFORMANCE EVALUATION same set of models in order to improve accuracy.

The rainfall dataset that has been analyzed in this project are
all collected into a CSV file. The rainfall has been Predictive Models Accuracy Standard
monitored from 1901 to 2017 in Kerala, India. Deviation
Predictive Model Classification:
The prediction was made using four different types of K-Nearest Neighbors 74.6% 0.172
classifiers: K-Nearest Neighbors (KNN), Support Vector (KNN)
Classifier (SVC), Decision Tree Classifier (DTC), Binary
Logistic Regression as well as a stacked model. The base Support Vector Classifier 90.6% 0.111
models used in the stacked model were the aforementioned (SVC)
four classifiers, with the meta-model being Binary Logistic
Regression. Decision Tree Classifier 77.2% 0.772
(DTC)
1. The models were initially applied to the rainfall data of
the monsoon period in Kerala, which lasts for 5 months - Binary Logistic 93.0% 0.103
June, July, August, September, and October [12]. Therefore, Regression
only the rainfall data from these months were used for the
analysis. The table below presents the outcomes of the Stacked Generalization 93.3% 0.098
models that were applied.

Fig 7. Whisker box plot for Standalone Model Accuracies on the Monsoon
Rainfall Data
Fig 8. Whisker box plot for Standalone Model Accuracies on the 12 months
Rainfall Data

Based on both tables, it can be observed that all models


showed better accuracy when using the data for all 12
months of the year, compared to using only the monsoon
months, with the exception of the Decision Tree Classifier
Predictive Models Accuracy Standard (DTC). So, it can be clarified that floods do not depend only
Deviation on the rainfall during monsoon, but also on the precipitation
of the whole year along with other factors. Binary Logistic
K-Nearest Neighbors 83.4% 0.158 Regression has achieved the highest accuracy among the
(KNN) single models. Stacked Generalization has given a better
accuracy with a lower standard deviation than all the single
Support Vector Classifier 86.4% 0.146 models for the 12 months rainfall data. So it can be said that
(SVC) Stacked Generalization is effectively better for flood
prediction than the single algorithm models.

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on October 04,2024 at 15:36:17 UTC from IEEE Xplore. Restrictions apply.
VII. CONCLUDING REMARKS AND FUTURE PROSPECTS [10] Model. TechLeer. https://www.tJha, V. (2018, June 17). Decision
Tree Algorithm for a
Predictiveechleer.com/articles/120-decision-tree-algorithm-for-a-predi
This study focused on predicting floods using ctive-model/
meteorological data through the development of an [11] Fernandes, A. A. T., Figueiredo Filho, D. B., Rocha, E. C. D., &
Nascimento, W. D. S. (2020). Read this paper if you want to learn
ensemble machine learning model. By comparing the logistic regression. Revista de Sociologia e Política, 28(74), 2–3.
performance of various models including KNN, SVC, https://doi.org/10.1590/1678-987320287406en
decision trees, and logistic regression, we found that the [12] Chaturvedi, A. (2021, May 30). Monsoon likely to arrive in Kerala on
May 31, says IMD; heavy rainfall predicted in Karnataka from June
ensemble model exhibited superior accuracy and precision 1. Hindustan Times. Retrieved September 30, 2021 from
compared to individual models. Additionally, the ensemble https://www.hindustantimes.com/india-news/monsoon-likely-to-arrive
model outperformed previous studies in flood prediction -in-kerala-on-may-31-says-imd-101622366991176.html
[13] Saini, A. (2021, August 3). Conceptual Understanding of Logistic
using machine learning models.
Regression for Data Science Beginners. Analytics Vidhya. Retrieved
September 25, 2021 from
There are several avenues for future research to enhance the
https://www.analyticsvidhya.com/blog/2021/08/conceptual-understan
accuracy of the proposed flood prediction model. Firstly, ding-of-logistic-regression-for-data-science-beginners/ Khatun, F.
additional data sources such as soil moisture and land use (2021, May 31). Living with floods and reducing vulnerability in
will be incorporated. Secondly, the model's performance will Bangladesh. The Daily Star.
https://www.thedailystar.net/opinion/macro-mirror/news/living-floods
be evaluated under different temporal and spatial scales.
-and-reducing-vulnerability-bangladesh-1950277
Thirdly, the use of other ensemble techniques, such as [14] Sankaranarayanan, S., Prabhakar, M., Satish, S., Jain, P., Ramprasad,
bagging and boosting, will be explored to further improve A., & Krishnan, A. (2019). Flood prediction based on weather
parameters using deep learning. Journal of Water and Climate
the model's accuracy. Finally, the development of an online Change, 11(4), 1766–1783. https://doi.org/10.2166/wcc.2019.321
flood prediction system based on the proposed model is [15] "Ensemble Stacking for Machine Learning and Deep Learning,"
Analytics Vidhya, Aug. 2021. [Online]. Available:
planned to provide real-time flood warnings to local https://www.analyticsvidhya.com/blog/2021/08/ensemble-stacking-for
communities and authorities. -machine-learning-and-deep-learning/.
[16] Zhang, Z., "Introduction to machine learning: k-nearest neighbors,"
Annals of Translational Medicine, vol. 4, no. 11, pp. 218–218, Jun.
REFERENCES 2016.
https://atm.amegroups.com/article/view/10170/11310
[1] Floods: Occurrence and Distribution. (n.d.). Department of Geology,
Aligarh Muslim University. Retrieved May 29, 2021, from
http://www.geol-amu.org/notes/be1a-3-1.htm#:~:text=A%20flood%2
0occurs%20when%20water,oceans%20that%20submerges%20nearby
%20land.&text=The%20
most%20common%20cause%20of,during%20an%20unusually%20he
avy%20rainfall.
[2] Flood. (n.d.). National Geographic. Retrieved June 2, 2021, from
https://www.nationalgeographic.org/encyclopedia/flood/?fbclid=IwA
R3rkSgka-9VdX7VzrKf-2nmulw8m0u_IY7IzaCTHluz-ZxOfAtfoPfa
kXU
[3] Jongman, B., Ward, P. J., & Aerts, J. C. J. H. (2012). Global exposure
to river and coastal flooding: Long term trends and changes. Global
Environmental Change, 22(4), 823-835.
https://doi.org/10.1016/j.gloenvcha.2012.07.004
[4] Flooding will affect double the number of people worldwide by 2030.
(2019). The Guardian. Retrieved May 29, 2021, from
https://www.theguardian.com/environment/2020/apr/23/flooding-dou
ble-number-people-worldwide-2030
[5] Bangladesh – Floods Affect Over 1 Million People in 13 Districts.
(2020, June 8). FloodList.Retrieved June 2, 2021, from
http://floodlist.com/asia/bangladesh-floods-update-july-2020
[6] Rainfall Index. (n.d.). USDA. Retrieved September 25, 2021, from
https://www.rma.usda.gov/en/Policy-and-Procedure/Insurance-Plans/
Rainfall-Index
[7] Between Normalization vs. Standardization. Analytics Vidhya.
Retrieved September 30, 2021 from
https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machi
ne-learning-normalization-standardization/

[8] Liong, S. Y., & Sivapragasam, C. (2002). FLOOD STAGE


FORECASTING WITH SUPPORT VECTOR MACHINES. Journal
of the American Water Resources Association, 38(1), 173–186, from
https://doi.org/10.1111/j.1752-1688.2002.tb01544.x
[9] Decision Trees. (n.d.). Scikit-Learn. Retrieved September 25, 2021,
from
https://www.google.com/url?q=https://scikit-learn.org/stable/modules/
tree.html&sa=D&source=editors&ust=1632530223239000&usg=AO
vVaw3goZIV_u6qNPkusWgcczm3

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on October 04,2024 at 15:36:17 UTC from IEEE Xplore. Restrictions apply.

You might also like