1 s2.0 S2667378922000153 Main

14 (2022) 200077
Contents lists available at ScienceDirect
Resources, Conservation & Recycling Advances

journal homepage: www.sciencedirect.com/journal/
Resources-Conservation-and-Recycling-Advances
Deforestation probability assessment using integrated machine learning

algorithms of Eastern Himalayan foothills (India)
Soumik Saha a, ϯ, Sumana Bhattacharjee b, Pravat Kumar Shit c, Nairita Sengupta d,
Biswajit Bera a, *, ϯ
a
Department of Geography, Sidho-Kanho-Birsha University, Ranchi Road, P.O. Purulia Sainik School, Purulia, 723104, India
b
Department of Geography, Jogesh Chandra Chaudhuri College (University of Calcutta), 30, Prince Anwar Shah Road, Kolkata 700 033, India
c
PG Department of Geography, Raja Narendralal Khan Women’s College (Autonomous), Vidyasagar University, Midnapore, 721102, India
d
Department of Geography, Diamond Harbour Women’s University, Sarisha, 743368, India.
A R T I C L E I N F O A B S T R A C T
Keywords: The significant biodiversity rich Jaldapara National Park is situated at Terai-Dooars region of Eastern Himalayan
Jaldapara nationalpPark foothill. This study attempts to identify the deforestation probable zones at Jaldapara national park and its
Deforestation probability surroundings applying five different machine learning algorithms (SVM, NB, RF, DT and ANN). Results show that
Machine learning algorithms
the northern and middle sections are being faced by high rate of deforestation due to large scale human
AUC value
Support vector machine (SVM)
encroachment, poaching and timber trafficking. Result also illustrates that support vector machine (SVM) brings
more accuracy compared with other models. These deforestation probable models are validated through receiver
operation characteristics, efficiency, sensitivity and specificity measurement. Area under curve (AUC) value of
these models is 0.907, 0.885, 0.825, 0.846 and 0.876 respectively. The novelty of this research is that previously,
such machine learning methods (with high precision) have not applied to examine the deforestation probability
in this region of Himalayan foothill.
1. Introduction socio-economic factors (settlement, roads, infrastructure, population

etc.), bio-physical factors (soil, geology, geomorphology etc.), different
Expansion of different economic projects along with decline of forest biotic as well as abiotic disturbances (air pollution, soil pollution, pests
resources is primary concerned of the researcher as well as the envi etc.) etc. (Bax and Francesconi, 2018).
ronmentalists from the last few decades. Forest plays an important role Presently, high rate of deforestation has a great concern among the
such as carbon storage, biodiversity conservation, eco-system services, researchers and environmentalists because earth was covered by 60
soil formation and conservation, air purification, water cycle continua million square km of forests before human civilization but after human
tion and oxygen production (Gibson et al., 2011). In a simplistic way we civilization less than 40 million square km of forest lands has been
can define the deforestation as the removal of vegetation cover from the existed (FAO, 2015). Globally, between 2000 and 2012, around 2.3
land surface. The American Forest Society defines deforestation as a million km2 forests were eliminated or cut down with a rate of 2 × 105
process of removing the trees due to the effect of agricultural activities, km2/year. The south-east Asian countries had faced severe forest loses
climatic condition, grazing, disease, forest fire etc. (Yanai et al., 2012). problem due to large scale timbering, unscientific developmental plan
Food and Agricultural Organisation (FAO) defines deforestation is the and agricultural expansion in between 2000 and 2005 (Stibig et al.,
qualitative and quantitative reduction and degradation of forest as well 2014). Presently, Asia Pacific region has faced a slide declination in the
as forest health (Deacon, 1994). From various literature review, some rate of forest loss since last five years compared with early 1990s
leading deforestation related factors have been investigated such as (GFRA, 2015). National and international investments over forest re
climatic factors (rainfall, solar radiation, temperature condition etc.), gions due to the increasing demand of forest resources is the central
* Corresponding author at: Biswajit Bera, Department of Geography, Sidho-Kanho-Birsha University, Ranchi Road, P.O. Purulia Sainik School, Purulia, 723104,
India.
E-mail addresses: soumiksaha577@gmail.com (S. Saha), sumana.aarohi@gmail.com (S. Bhattacharjee), shitpravat2013@gmail.com (P.K. Shit),
nairitasengupta2@gmail.com (N. Sengupta), biswajitbera007@gmail.com (B. Bera).
ϯ
These authors contributed equally to this study.
https://doi.org/10.1016/j.rcradv.2022.200077
Available online 31 March 2022

2667-3789/© 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
S. Saha et al. Resources, Conservation & Recycling Advances 14 (2022) 200077
Fig. 1. Geographical location of the study area
mechanism of forest loss particularly in Latin America, Sub Saharan forest habitat in tropical and sub-tropical moist or dry deciduous forest
Africa, South-East Asia (Dell’Angelo et al., 2017; De Schutter, 2011; region of southern Asia (Bera et al., 2021a). The forest cover areas of
Chamling and Bera, 2020b). In case of south Asian countries and the Himalayan foothill zone are being gradually deteriorated with time due
islands, forest cover is being reduced tremendously due to large scale to huge expansion of agricultural activities, timber trafficking and
plantation farming. Indonesia and Malaysia also contributed around infrastructural development (Bera et al., 2021b). Tropical deforestation
53% and 34% global palm oil production respectively in 2013. is a significant factor for global climate change and makes a great
Indonesia elapsed Brazil in respect to forest clearance for logs of natural concern among the environmentalist. Tropical deforestation is also
forests from 2000 to 2012 (Margono et al., 2014). The contemporary associated with regional hydrological input modification, regional cli
scientific studies focused that the forests of tropical biome have been matic system, global bio-chemical cycle and biodiversity loss (Puig,
played an important role in carbon sink but in the recent years; large 2000; Fontan, 1994). The northern part of West Bengal was featured
scale deforestation diminishes the carbon store capacity (Hansen et al., with dense forest cover. Now, this region has been associated with
2013). Since the last few decades, large scale land acquisition has been several national parks and sanctuaries such as Buxa Tiger Reserve, Jal
occurred in the global south or the developing countries by the foreign dapara, Gorumara, Neora Vally, Chapramari, Jorepokhri, Mahananda
or domestic investors who have a goal to achieve more forest resources (West Bengal Forest Department, 2016), that are connected with
and agricultural commodities and the government of these countries biodiversity conservation and sustainable forest resource management.
always accepts the investors due to their potentiality to increase foreign However, forests of this region have faced an alarming condition due to
technology, capital and promote job creation facility and development human expansion, intensification of agricultural land, infiltration of
activities (Chung, 2019). Protection of the global forest region is crucial human activities within forest region as well as ecology (Dey, 1991).
for climate change mitigation, local livelihood protection, biodiversity Different machine learning algorithms regarding the prediction of
conservation etc. but the world forests are now embedded with a deforestation probability assists the researchers and policy makers to
complicated network system by the international actors and policy take proper plans over the high deforested probable areas. Currently,
makers for the commercial trade (Verburg et al., 2013; Liu et al., 2015; very high accuracy added remote sensing satellite data along with sta
Chamling and Bera, 2020a). tistical and high accuracy added machine learning models have been
Illegal human intervention within forest pockets, poaching, cultiva widely used all over the world to generate accurate deforestation
tion at the vicinity of forest and extension of tea plantation are highly probable zones. Over the years various kinds of techniques have been
responsible for deforestation, biodiversity loss and fragmentation of used for deforestation probability assessment (statistical approaches,
2
spatial approaches and machine learning approaches) (Mayfield, 2015). Table 1

Common parametric models are extensively used in various studies but Different important existing species variation in the study area (Jaldapara forest
machine learning models are able to generalize a huge set of data with and its adjacent region)
precise representation (Dlamini, 2016). Machine learning approaches Name of the family Scientific name Area specific vernacular
can provide an influential and efficient way to deal with large number of term
data that are mainly non-linear, high dimensionality and its complicated Actinidiaceae Saurauia napaulensis Gagun
interaction with missing value (Bhattacharya, 2013). Machine learning Saurauia roxburghii Gagun
approaches can significantly improve the accuracy of the model and Choerospondias axillaris Labsi
Lannea coromandelica Jia
these kinds of models are also used in different hazard assessment
Mangifera indica Aam
studies like deforestation as well as forest management (Rogan et al., Mangifera sylvatica Roxb Jangli Aam
2008). Machine learning models significantly provide various advan Spondias pinnata Amaro
tages over the traditional statistical methods (Liu et al., 2018). Arecaceae Supari
Presently, machine learning becomes a popular branch of artificial Areca catechu Rangbhang
Caryota urens Bottle palm
intelligence and it is also frequently used in hazard prediction studies. Roystonea regia
The main mechanism of machine learning is to express the relationship Bignoniaceae NK
between target variable and the predictors using the computer algo Markhamia lutea Parari
rithms from training dataset (Chen et al., 2017). Since 1990s, machine Stereospermum chelonoides Totola
Oroxylum indicum Parari
learning approaches are being extensively used for environmental
Stereospermum tetragonum
studies (Hsieh, 2009). Now, these ML methods have become popular in Combretaceae Bahera
forest ecosystem and degradation researches (Bhattacharya, 2013). Terminalia bellirica Pakasaj
Random forest (RF) is a powerful and widely used machine learning Terminalia elliptica
implication that can predict the target variables with a high accuracy Fabaceae Sissoo
Dalbergia sissoo Roxb Sobabul
rate (Devasena, 2014). Support Vector Machine (SVM) is another
Leucaena leucocephala Siris
important machine learning algorithm which becomes accepted and Samanea saman Asok
useful with the development of artificial intelligence and RS-GIS tech Saraca asoca Babul
niques (Huang and Zhao, 2018). Artificial neural network is (ANN) also Vachellia nilotica
Lauraceae Kutmero
widely used in medicine and molecular biology but it had been largely
Litsea monopetala Kawla
used in ecology and environmental sciences at the beginning of 1990s. Machilus gamblei King
Different soft computing models such as artificial neural networks, Meliaceae Mehagini
neuro-fuzzy logic, decision trees, and support vector machines (SVM), Swietenia mahagoni Phalame
maximum entropy model have been widely used by the researchers all Walsura tubulata Hiern Kanthal
Artocarpus heterophyllus Toon
over the world to compute and predict different physical phenomena
Lam
such as landslide susceptibility, forest fire susceptibility, deforestation Toona hexandra
susceptibility, groundwater potentiality etc. (Xu et al., 2012; Pradhan, Sapindaceae Reetha
2013; Wu et al., 2014; Saha et al., 2020; Bera et al., 2020a). After the Lepisanthes rubiginosa Ritha
Sapindus mukorossi Gaertn Litchu
rapid development of Artificial Intelligence, the application of machine
Litchi chinensis Sonn
learning in the context of remote sensing studies has become very much
popular (Mountrakis, 2011). The algorithms such as support vector
machine, decision tree, random forest, artificial neural network have National Park in 2012 by combining the sanctuaries which are the home
been applied in land cover classification and changing pattern, predic of various species such as leopard, elephant, Indian gaur, different type
tion of forest biomass, analysis of the deforestation susceptibility (Gri of birds, snakes etc. (Ghosh et al., 2013). Geographical extension of the
nand et al., 2013; Dlamini, 2016). Machine learning algorithms total Jaldapara region is from 26◦ 31′ to 26◦ 45′ N and 89◦ 14′ to 89◦ 24′ E
inevitably require significant amount of data for training the model (Fig. 1) and Jaldapara National Park is restricted under the recently
(Mountrakis, 2011). The capability of ML models has been depended on delimited Alipurduar district of northern West Bengal. The whole Jal
the rigorous use of training and testing dataset. However, lack of suffi dapara range is divided into two different parts such as wildlife sanc
cient dataset is a major bottleneck that prevents the widespread appli tuary and reserve forest (Deb et al., 2018). River Torsa divides the whole
cation of machine learning models particularly in the context of forest sanctuary region into two parts i.e., the eastern part known as Chilapata
research and forest ecosystem analysis studies (Liu et al., 2018). Remote forest (Bhattacharyya and Padhy, 2013) and the western part known as
sensing data along with Geographical Information System (GIS) is a vast Jaldapara. Previously, these two forests were connected with each other
discipline for predicting and solving many major earth physical issues but now whole forest region becomes disconnected by severe defores
with a high accuracy rate in a very shorter time period and it will be tation corridors since the colonial period. Presently, in the context of
better when it is coupled with other highly accepted statistical, machine administrative purposes the whole Jaldapara forest region has been
learning models (Saha et al., 2020). The main objective of the study is to classified into total 100, 30 and 9 forest compartments, beats and ranges
detect the proper deforested probable zones using various machine respectively. This entire forest region has two main perennial rivers
learning algorithms at the famous wildlife sanctuary Jaldapara (Eastern which are Malangi and Torsa. Malangi is mainly rain fed river whereas
India) and its surrounding areas of Himalayan foothill. Torsa is a glacial fed river. Bhabar, Terai and Alluvial formation are the
noticeable geological formation within the area (Shukla et al., 2017).
2. Study area The dominating floral community has been classified into 36 species and
25 genera along with 114 various tree species and 75 different planted
Jaldapara national park is situated in Terai-Dooars region at the species in various parts of the forest region (Table 1). Not only the huge
Eastern Himalayan foot hill region of West Bengal with an extension of natural diversity but also the region is dominated by various tribal and
216.51 km2 area. The whole Jaldapara national park and its surrounding ethnic groups like, Garo, Toto, Megh, Chakma, Munda, etc. and a rich
areas are covered by riverine tropical forest and it was declared as a traditional culture is a primary resource of this area (Ghosh et al., 2021).
sanctuary in 1941 with its great variety of floral and faunal communal
diversity. Jaldapara is mainly famous for conservation of Indian one
horn rhinoceros. The government of India was declared Jaldapara as a
3
Fig. 2. Different thematic layers for deforestation prediction zone analysis, a. settlement density b. distance from settlement c. agricultural density d. distance from
road e. LULC.
4
Fig. 3. Different thematic layers for deforestation prediction zone analysis, a. elevation b. NDVI c. slope d. aspect e. Distance from river f. forest density
3. Material and Methods research as a controller of deforestation (distance from river, agricul
tural density, altitude, settlement density, forest density, distance from
3.1. Database settlement, distance from road, slope, aspect, Normalized Difference
Vegetation Index (NDVI) and Land Use Land Cover (LULC)). Remote
Eleven different parameters (Fig. 2 & 3) have been selected in this sensing based satellite data, Shuttle Radar Topography Mission (SRTM)
5
Table 2 quantify the severity of multi-collinearity (Table 3). Variance inflation

Data source and characteristics of the used thematic layers factor (VIF) value greater than 10 and the tolerance value less than 0.1
Thematic map layer Data Source of the data also indicate the multi-collinearity problem (Johnston et al., 2018). The
format tolerance and VIF values are as follows,
Altitude, slope, distance from river, Aspect Raster Digital elevation
Tolerance = 1 − R2 J (1)
grid model from USGS
(https://eart
hexplorer.usgs.gov/)
1
VIF = (2)
Distance from road Polyline Google earth pro Tolerance
Forest density, LULC, agricultural density, Raster Landsat 8 (OLI) image Where R2 J illustrates coefficient of determination of the regression
settlement density, distance from grid from USGS equation of the vectors.
settlement, NDVI (https://eart
hexplorer.usgs.gov/)
Deforestation area layer Raster Global forest watch
3.3. Application of different machine learning methods
grid website
(https://www.glo 3.3.1. Support vector machine (SVM)
balforestwatch.org/) Support vector machine (SVM) is a widely used machine learning
algorithm based on risk minimization principle which was proposed by
Vapnik (Vapnik, 1995). This algorithm separates the classes in a surface
Table 3 (optimal hyper-plain) and clearly illustrates the margin among the
Collinearity statistics of the selected explanatory factors dataset (Abe, 2010). The given training points are near to the
Variables Tolerance VIF hyper-plain which is called the support vectors and the aim of this
hyper-plain is to distinguish the different classes (Pradhan, 2013).
NDVI 0.555 1.802
Slope 0.911 1.097 The aim of SVM is to find the n dimensional hyper-plain and
Settlement density 0.229 4.366 differentiate the dataset which is expressed as,
Forest density 0.602 1.661
Distance from river 0.799 1.252 1
||w||2 (3)
Distance from settlement 0.492 2.031 2
Distance from road 0.683 1.465
Agricultural density 0.248 4.038 Subject to the following constrain is yi ((w.xi ) + b) ≥ 1.
Altitude 0.588 1.700 Where ||w|| indicates the norm of the hyper-plain and b is the scalar
LULC 0.65 2.4
base.
Aspect 0.82 3.017
The cost function follows the following formula,
1 ∑n
digital elevation model, global forest watch data have been considered L = ||w||2 − λi (yi ((w.xi ) + b) − 1) (4)
2
here for preparation of different thematic layers related with defores i=1
tation. Satellite image and Shuttle Radar Topography Mission (SRTM)

digital elevation model (DEM) have been extracted from USGS website Where λi indicates the Lagrangian multiplier, w & b indicate the stan
(https://earthexplorer.usgs.gov/). Altitude, aspect, slope, distance from dard procedure. In the non-separable cases constants are modified by
river, layer have been generated from digital elevation model (DEM) slack variablesξi . It is following the equation as
where as other cultural layers such as distance from road, settlement yi ((w.xi ) + b) ≥ 1 − ξ1 (5)
density, distance from settlement, LULC, agricultural density have been
prepared from the satellite image classification (Table 2). The deforested Support Vector Machine model is one of the popular and widely used
zones have been culled from the global forest watch website (https supervised machines learning classifier (Dhingra and Kumar, 2019). A
://www.globalforestwatch.org/). Here, total 30 years temporal scale recent study also analyses the deforestation zones through land use land
(1990-2020) has been considered for the deforestation zone demarca cover classification. It has determined that the overall accuracy was
tion. Another important step for deforestation susceptibility analysis is 93.74% in the case of SVM model and the kappa-coefficient was 0.92%
to generate deforestation inventory points. Here, around 250 deforested (Babu and Sudha, 2018). Another study depicts that SVM model can
points have been identified within the study region whereas same significantly determine the forest disturbance regions with the help of
number of points has been generated randomly as non-deforested points. 238 points in three different datasets i.e. all forest (AUC= 90.14),
From the point of machine learning study, the deforestation and non- temperate forest (AUC= 91.9) and tropical dry forest (79.63) (Sol
deforestation can be considered as a binary classification study associ órzano and Gao, 2022).
ated with mainly two classes. Deforestation points are considered as ‘1’
whereas non-deforestation points are considered as ‘0’. All points are 3.3.2. Random Forest model (RF)
randomly selected and classified into two groups, training (70% data) Random forest (RF) is a widely used ensemble-learning method
and testing (30% data). Training dataset has been used for training the which was proposed by Breiman (Breiman, 2001). The RF algorithm
applied model and testing dataset has been used for validate the dataset. creates many classification trees during the operation period or the
For analysis of the deforestation susceptibility, total 30 years defores training period and it generates the final model by average the value of
tation data have been considered within the study period and defores all classification trees. The main two parameters of RF algorithm is i) the
tation points as well as zones have been pointed out by considering this square root of the number of factors and ii) number of trees to run the
temporal time span. model. RF algorithms use the technique of boot-strap aggregating. The
RF algorithm also uses the Gini index as a separator which follows,
j
∑ ∑ j
∑
3.2. Multi-collinearity test IT (p) = pi pk = 1 − p2i (6)
i=1 k∕
=1 i=1
In this research multi-collinearity test has been applied to avoid the

collinearity problems between the conditioning or explanatory factors. Here, T signifies the training dataset and j demarcates the no of classes.
Tolerance value and variance inflation factor (VIF) have been used to
6
Table 4
Result of ROC & comparison in the different machine learning models
Models AUC Std. Error Asymptotic Sig Lower Bound Upper Bound TPR FPR TNR FNR Efficiency
SVM .907 .025 .000 .858 .956 0.892 .086 .914 0.108 0.814
NB .885 .032 .000 .821 .948 0.854 .241 .759 0.146 0.785
ANN .876 .034 .000 .800 .932 0.823 .103 .897 0.177 0.769
DT .846 .039 .000 .740 .892 0.816 .166 .834 0.184 0.764
RF .825 .041 .000 .705 .866 0.790 .187 .813 0.210 0.735
3.3.3. Artificial neural network (ANN) different subsets. A decision tree algorithm grows by selected attributes
Artificial neural network is a statistical or mathematical model based with the smallest entropy. Entropy is calculated by the following
on biological neuron process function that is fundamental for human equation,
brain process. This model was proposed by McCulloch and Pitts in 1943. ∑ ( ⃒ ) ( ⃒ )
Artificial neural network model can simulate nonlinear relationship Entropy(n) = − p cj ⃒N log2 P cj ⃒N (3)
j
among the variables. Most commonly used type of artificial neural
network is multi-layer perception (MLP). MLP is developed by the ⃒
Where, P(cj ⃒N) represents the frequency of N.
blending of three different layers i) input layer ii) hidden layer, it may be The entropy of the selecting attribute A is given by
one or more than one, and iii) output layer. In artificial neural network,
a hyperbolic tangent or sigmoid function is widely used for mathemat ∑
k
|N|j ( )
Entropy∧ (N) = ∗ Entropy Nj (14)
ical convenience. j=1
|N|
ex − e− x
Itfollows, f (x) = tanh(x) = (7)
ex − e− x
3.4. Validation
The Artificial neural network algorithm follows the below mention
equation,
Validation of the implemented models is an important step in any
ρ
∑ ( ) type of research. Receiver Operating Characteristic (ROC) curve is a
netjl (t) = yi−t 1 (t)wji (t) (8) useful tool which can assess the goodness of fit of the implemented
models (Fig. 6). Receiver operating characteristic curve is generated by
i=0
Here, i refers the iteration, layer represented by l and j represents the sensitivity in y axis against 1-specificity in x axis (Fig. 6). The Receiver
neuron. Operating Characteristic (ROC) curve is a well-accepted validation
Here, the δ factor is in the case of j neuron and output layer is follows, method of various predictive models such as landslide, deforestation,
[ ]∑ l ground water potentiality, forest file susceptibility etc. (Chen et al.,
δlj (t) = ylj (t) 1 − yj (t) δj (t)w(l+1)
kj (t) (9) 2017; Rahmati et al., 2017; Gigović et al., 2019; Saha et al., 2020). The
area under ROC curve (AUC) represents the capability of the model. The
Here, the δ factor is in the case of j neuron and hidden layer is following, value near 1 represents the high validity and high predictive power of
[ ] the model whereas the value near 0 represents low validity or the low
(10)
(l− 1)
wlji (t + 1) = wlji (t) + α wlji (t) − wlji (t − 1) + nδ(l)
j (t)yj (t) predictive power of the model (Table 4). The value of AUC is categorized
into different classes with different accuracy level such as (0.9-1)
Where, α and n refer momentum and learning rate respectively. excellent, (0.8-0.9) very good, (0.7-0.8) good, (0.6-0.7) average and
(0.5-0.6) poor. The ROC curve follows the below mentioned equation,
3.3.4. Naïve Bayes (NB) ( )
∑n
Naïve bayes classifier is a collection of algorithm based on Bayes SAUC = (XK+1 − XK ) SK + 1 − SK+1 −
SK
(15)
theorem. It is a group of algorithms where all share common principles. k=1
2
Naïve bayes method is widely used algorithm in machine learning fields
due to its simplicity and linear run time and naïve bayes is a simple In this equation SAUC signifies the AUC and SK and XK represents the
probabilistic based method that can accurately predict the class mem sensitivity and 1-specificity respectively. Other implemented statistical
bership probabilities (Farid et al., 2014). In this algorithm a covariance indicators are TPR, FPR, TNR, FNR, Efficiency etc.
matrix is constructed by the mean of each class and then Bayes theorem The TPR and FPR are as follows,
has been applied for discrimination (Bhargavi and Jyothi, 2009). Naïve TP
Bayes classifier follows, TPR = (16)
TP + FN
∏17 ( )
argmaxP(yi ) xi
yNB = P (11) FPR =
FP
(17)
yi = [event, non − event] i=1 yi FP + TN
( )
xi 1 − (xi − n)2 Where, TP represents true positive, FN stands false negative, FP in
P = √̅̅̅̅̅̅̅̅ e (12) dicates false positive and TN shows true negative.
yi 2πa 2a2
Efficiency (E) is another assessment method that has been used to
( )
xi
measure the accuracy of the model (Fukuda et al., 2013). These two
Where, P(yi ) indicates the prior probability, P yi follows the condi methods have been calculated using the following equations,
tional probability, a and n represent SD and mean respectively. TP + TN
E= (18)
TP + TN + FP + FN
3.3.5. Decision tree (DT)
Decision tree algorithm is another type of widely used algorithm The performance of the machine learning models is based on training
with tree growth and tree pruning steps (Yeon et al., 2010). A decision and testing data which has been evaluated using different error mea
tree is a machine learning algorithm which divides the data into surement methods, i.e. root mean square error (RMSE), coefficient of
determination (R2) and mean absolute error (MAE). These statistical
7
Fig. 4. Different deforestation probable zones by different machine learning models a. SVM model b. NB model c. RF model d. DT model and e. ANN model
indicators compare the outcomes of the applied models. In the case of 4. Result
statistical modelling, difference between observed value and associated
computed value is termed as error (Chai & Draxler, 2014). Both statis 4.1. Multi-collinearity analysis
tical indicators have been done using R programming software.
The collinearity test indicates that there is no multi-collinearity
8
Table 5
Measurement of accuracy of the used machine learning models through various error measurement techniques
Error measures SVM NB RF DT ANN
Training Testing Training Testing Training Testing Training Testing Training Testing
RMSE 0.178 0.159 0.214 0.196 0.278 0.326 0.297 0.341 0.287 0.257
MAE 0.092 0.079 0.124 0.105 0.184 0.214 0.194 0.216 0.176 0.187
R2 0.894 0.905 0.846 0.867 0.716 0.674 0.697 0.629 0.706 0.742
problem among the explanatory factors. All parameters have the VIF 4.5. Deforestation probability analysis by DT model
value less than 10 and the tolerance value is greater than 0.1 which
means all variables are independent and ready to use in the imple Decision Tree model has been successfully applied here to demarcate
mented predictive models (SVM, NB, RF, DT and ANN) (Table 3). the deforestation probability areas. The outcome of decision tree model
has been classified into five different categories (Fig. 4(d)) such as very
4.2. Deforestation probability analysis by SVM model low (26.30%), low (16.62%), moderate (16.16%), high (15.91%) and
very high (25%) using natural breaks classifier. High and very high
Support vector machine (SVM) model has been applied here to deforestation probability areas are particularly found in the middle
demarcate the proper deforestation probability zone in the Jaldapara section and northern part of the Jaldapara forest region and the low and
forest region. The result of this prediction model has been categorized very low class are particularly confined in whole eastern part and
into five different classes (Fig. 4(a)) such as very low (17%), low western part of the study area.
(16.90%), moderate (14.62%), high (20.64%) and very high (30.84%)
(Fig. 7). This classification is very much useful for both prediction and 4.6. Deforestation probability analysis by ANN model
possibilities of deforestation cases. The raster output of the probability
maps has been classified using natural breaks method in ArcGIS envi The prediction result of artificial neural network model has been
ronment. The method of natural breaks is a highly used and reliable classified into five different classes (Fig. 4(e)) such as very low
raster classification method. This method divides the raster data into (15.47%), low (17.83%), moderate (21.13%), high (20.99%) and very
natural categories that can significantly minimize the variances within high (24.58%) using natural breaks classifier. High and very high
the classes and maximizes between the classes (Jenks, 1967). The nat deforestation probability areas have been observed in the northern,
ural breaks method classified the deforestation probability zones into middle and some eastern part of the Jaldapara forest and the sur
five deforestation classes based on different threshold values such as rounding regions particularly in the place of Jaldapara, Nutanpara,
very high (0.78-1), high (0.58-.78), moderate (0.40-0.58), low Uttar khairbari, Madhya satali, Uttar simlabari etc. whereas the low and
(0.22-0.40) and very low (0-0.22) and this classification method con very low classes are noticed in the eastern, western and some south-
tinues to all of the models. High and very high deforestation probability western part of the study area, particularly in Kalaberia, Madhya
pockets have been identified in the northern part, and the middle part of madarihat, Uttar mandabari, Kumarpara etc.
the study area mainly in the region of Uttar khairabari, Uttar madarihat,
Nutanpara, Sidhabari, Suripara, Salkumarhat etc. whereas low and very 4.7. Validation Assessment
low deforestation probability areas have been observed in the eastern
and south-western part of the study area particularly the localities of A single validation method is not sufficient to validate the model
Uttar mandabari, Dakshinmandabari, Kumarpara, Lachhmandabri etc. properly (Saha et al., 2020). In this research different validation
methods have been systematically applied. SVM, NB, RF, DT and ANN
4.3. Deforestation probability analysis by NB model have been evaluated by the characteristics of receiver operating char
acteristics, value of AUC of ROC curve, efficiency (E), TPR and FPR.
Naïve Bayes (NB) classifier has been accurately applied here for These methods signify the prediction capability of the applied machine
demarcation of deforestation probable zones and the output has been learning algorithms. The value of AUC of the SVM, NB, RF, DT and ANN
classified into five different classes (Fig. 4(b)) such as very low (18.6%), is 0.907, 0.885, 0.825, 0.846 and 0.876 respectively. It indicates that the
low (15.84%), moderate (19.47%), high (19.53%) and very high SVM has the high prediction capability in the case of this research
(26.53%) using natural breaks classifier (Fig. 7). Naïve Bayes model has (Table 4). The sensitivity (TPR) of SVM, NB, RF, DT and ANN is 0.892,
been predicted that northern part and middle sections of the Jaldapara 0.854, 0.790, 0.816 and 0.823 respectively and the specificity (FPR)
forest region have faced high and very high deforestation probability values of the models are 0.108, 0.146, 0.210, 0.184 and 0.177 respec
particularly in Uttar khairabari, Uttar madarihat, Nutanpara, Sidhabari tively. It clearly illustrates the good predictive power of the models
etc. whereas the eastern, western and north-western part have faced low (Table 4). The efficiency value also shows the robustness of the model
and very low deforestation possibility particularly in Uttar mandabari, and the values of efficiency are 0.814, 0.785, 0.735, 0.764 and 0.769
Dakshinmandabari, Kumarpara, Lachhmandabri etc. respectively. All the validation result portrays that the support vector
machine (SVM) provides us better predictive result which is followed by
4.4. Deforestation probability analysis by RF model naïve bayes (Table 4). For the assessment of performance analysis of
various deforestation models, different error measurement techniques
The deforestation probability using random forest model has been have been applied such as coefficient of determination (R2), mean ab
accomplished using the relative weight of mean decrees accuracy and solute error (MAE) and root mean square error (RMSE). The important
mean decrees accuracy of Gini index of deforestation variables. The findings of these various error measurement techniques illustrate that in
result of RF model has been classified into five different classes (Fig. 4 this research support vector machine gives us more satisfied result than
(c)) such as very low (25.70%), low (16.63%), moderate (19.27%), high the others. The value of R2, MAE and RMSE of the training set in the SVM
(17.42%) and very high (20.98%) (Fig. 7) using natural breaks classifier. model is 0.894, 0.092 and 0.178 respectively whereas the testing phase
RF model predicts that northern part and some middle section of Jal gives the result 0.905, 0.079 and 0.159 respectively (Table 5). Wilcoxon
dapara forest and its adjacent region are facing in high deforestation Signed Rank Test has been applied here for analysis the significant
probability and eastern, western and south-eastern part have been comparison of the applied deforestation susceptibility models. The
experienced low deforestation probability. result of this non-parametric test (Wilcoxon Signed Rank) identifies the
9
Table 6 significant difference between the models based on Z and P value

comparison of various machine learning models for deforestation probability (Table 6). The importance analysis of the controlling factors has also
assessment with the help of Wilcoxon Signed Rank Test been clearly completed (Fig. 5).
Model comparison Za value Significance
SVM-NB -14.146b P<0.05 5. Discussion

SVM-RF 12.429b P<0.05
SVM-DT -12.847c P<0.05 Deforestation is one of the major concerns in environmental research
SVM-ANN -13.548b P<0.05
because it is the main factor of environmental degradation (Kumar et al.,
NB-RF -14.578b P<0.05
NB-DT -14.578c P<0.05 2014). Presently, exact demarcation of deforestation probability zone is
NB-ANN 12.178b P<0.05 an important tool to prevent the deforestation probability. Appropriate
RF-DT -2.547b P>0.05 determination of deforestation probable zones may help the environ
RF-ANN 12.698c P<0.05 mentalists and developers or planners to take proper management pro
DT-ANN -6.478b P<0.05
grammes so it is a concern topic among the researchers all over the
Here world. Statistical with high probabilistic machine learning techniques
a
represents the Wilcoxon Signed Rank Test have been developed and executed all over the world to develop proper
b
and c represent positive and negative rank respectively prediction of deforestation (Bera et al., 2020a; Kumar et al., 2014).
Machine learning algorithms presently gain significant attention in the
case of different environmental modelling because these models can
significantly predict the complex relationship between dependent
Fig. 5. Analysis the importance of various explanatory factors or predictors in the used machine learning models
10
Fig. 6. Receiver Operating Characteristic (ROC) curves for the validation of different machine learning models
variables and the predictors. Various machine learning prediction based region indicates that there is no buffer zone around the core forest region
models (such as generalised linear model (GLM), artificial neural which can increase the unwanted anthropogenic interaction between
network (ANN), Bayesian network (BN) models) are common and people and forest region and it leads the way of forest degradation (Deb
generally used for prediction of deforestation probable zones coupled et al., 2018). The LULC changing pattern of this region has been
with remote sensing data (Fenton and Neil, 2013; Mayfield et al., 2017). extracted by different temporal satellite images and classification
All five implemented machine learning models indicate that the north methods from 1978 to 2016 and it has been observed that dense forest
ern and middle part of Jaldapara forest adjacent areas have been faced region is tremendously changed due to illegal encroachments and
by high deforestation probability (Fig. 4). The northern section of the infiltration by the tribal and forest fringe dwellers. In 1978 total dense
Jaldapara forest and its surrounding regions are situated in the foothills forest region of this area was 7.93% and this forest area was decreased to
of eastern Himalaya. This is basically piedmont area and altitude has an 5.42% and 5.03% in 2001 and 2016 respectively (Deb et al., 2018).
increasing trend towards the Shivalik Himalayas. The eastern and Anthropogenic intervention is the most important driver of forest con
western sites of the study area have been faced by low deforestation servation in this region. The conversion of forest land into industrial
probability due to high forest density, restricted human movements and plantation land over last few decades is another important driver for
strict forest rules and regulations. River Torsha divides the whole study forest degradation. During the British colonial era many tribal people
area into two parts. Here, most of the models explicitly highlighted that came from Bihar and permanently settled in this region and year after
the forests of the study area such as Torsha forest range, Chilpata forest, year they are penetrating in this dense forest region. Recently, local and
and Jaldapara national park area have low deforestation probability national media have exposed the illegal poaching and timber trafficking
whereas Salkumar forest area, Dakshinbarajhor forest have high defor activities over this region and such activities lead the rate of forest
estation probability due to significant settlement growth within last fragmentation. Most of the areas of Jaldapara forest region is under
decade at the vicinity of the forests. The middle section of the study area Alipurduar district according to district census handbook 2011 and this
has faced various anthropogenic activities along with tribal settlement region is featured by large number of tribes (18.89%) and marginal
and these localities are Kalaberia, Suripara, Nutanpara, Salkumarhat workers (9.27%). Whereas a significant number of people is also
etc. Previously, the entire Himalayan foothill belt was covered by dense engaged in agricultural activities (37.32%) (District census handbook,
tropical forest but since the colonial period the land use pattern is being 2011). In the recent years, many central and state governmental projects
immensely transformed. As per the wildlife conservation strategy 2002, have been executed at the closeness and somewhere within the forest
there should be no eco-fragile zone around the national park for the regions. Timber trafficker’s racket has penetrated in the dense and patch
protection of core and buffer regions from the different anthropogenic forest areas and they cut carelessly the series of expensive old trees.
stresses (Deb et al., 2014). The LULC classification of Jaldapara forest Finally, timber traffickers sell these products in different national and
11
Fig. 7. Bar graph shows the area of different classes of different models
international markets with high price. As a result, patch forest areas tourism, plantation and practice of agriculture (Chamling et al.,
have been enlarged drastically years after years (Bera et al., 2020a and 2021).
2020b; Chamling et al., 2021). • Forest protection regulations and acts should be strictly executed for
timber and wild animal traffickers and poachers (Bera et al., 2020a;
5.1. Management approach of forest resources Bera et al., 2021b).
• Award and centre of excellence should be set up for the brilliant
Today, prediction and classification based machine learning and research on forest and protection and management of biodiversity
deep learning algorithms (Artificial Intelligence) are being applied in (Bera et al., 2021b).
different fields for the instant solution and management of various • All festivals and cultural programmes should be celebrated through
problems. Here, different machine learning algorithms have been used the new tree plantation system (Masiero et al., 2015; Bera et al.,
to identify the deforestation probability zones particularly in the Eastern 2021b).
Himalayan biodiversity hotspot zone. The Eastern Himalayan foothills • Environmental education and awareness should be spread among the
biodiversity zones provide huge provisional, regulatory, supporting, students and local people.
cultural and spiritual services directly to regional people as well as large • Significance of biodiversity and different causes of elimination of
number of global people. Recent studies focused that large scale defor biodiversity should be incorporated in the school and college sylla
estation, wild animal and timber trafficking and forest habitat conver bus (Bera et al., 2021b).
sion are significantly increasing within the last three or four decades in
different pockets of Eastern Himalayan foothills (Bera et al., 2020a). In 6. Conclusion
this respect, conservation of forest resources along with forest habitat is
highly required for the health of the total environment. Thus, relevant We live in the era of deforestation and land degradation and it is the
holistic forest management techniques should be considered (Fig. 8). global concerned among the policy makers, administrators and envi
ronmentalists. In this research, it has been observed that the support
• Different schemes of Joint Forest Management (JFM) should be vector machine (SVM) algorithm model provides more accurate and
implemented in different forest pockets of India through proper co precession result than the other machine learning models due to its high
ordination between forest dwellers, forest fringe people and forest sensitivity value along with high AUC value (0.90). Due to high popu
officers (Murali et al., 2002; Bera et al., 2021b). lation growth along with human-forest conflict is a serious worry in our
• Large scale use of non-timber forest products (NTFPs) should be country and being third world country forest regions are very much
extended particularly for the forest dwellers and forest fringe people witnessed by the anthropogenic stress due to their daily needs of com
(Suleiman et al., 2017). modities. In the present context, deforestation probability zone analysis
• Capacity building, alternate income generation and enhancement of along with daily monitoring and proper management strategies can lead
tribal livelihood are highly essential for the people who reside at the the forest sustainability. This study mainly identified the very high
proximity of forest and also within the forest pockets (Bera et al., deforestation probable zones along with proper reasons, so that gov
2021b). ernment can take different strategies for the management. In this regard,
• Financial support should be provided to tribal and non-tribal people creation of artificial forest buffer zone around the national park can
who are residing at the forest zones for the initiation of home stay improve health of the Jaldapara national park along with alternate
livelihood of forest dwellers and forest fringe people. The Joint Forest
12
Fig. 8. Flow diagram represents the different direct and indirect methods of forest management and biodiversity conservation
Management (JFM) scheme should be implementing to improve the for wild animal and timber traffickers. Further research is required to
forest health particularly in different pockets of this study area. The use enhance the alternate livelihood of the forest dwellers and forest fringe
of non-timber forest products (NTFPs) should be restricted for forest people. More financial supports should be supplied for further research
dwellers and forest fringe people in this area. Subsequently, government and development particularly to conserve the pristine natural resources.
with forest department should strictly impose the rules and regulations Community based forest management is an important tool in the present
13
day context where human and nature interaction occurs regularly Chung, Y.B., 2019. The grass beneath: conservation, agro-industrialization, and
land–water enclosures in postcolonial Tanzania. Ann. Am. Assoc. Geogr. 109, 1–17.
(Datta and Deb, 2017; Bera et al., 2020b). The results of these machine
https://doi.org/10.1080/24694452.2018.1484685.
learning models will definitely assist to the policymakers for sustainable Datta, D., Deb, S., 2017. Forest structure and soil properties of mangrove ecosystems
forest resource management along with wild species and wild habitat under different management scenarios: experiences from the intensely humanized
conservation. landscape of Indian Sundarbans. Ocean Coast Manag. 140, 22–33. https://doi.org/
10.1016/j.ocecoaman.2017.02.022.
De Schutter, O., 2011. Green rush: the global race for farmland and the rights of land
CRediT authorship contribution statement users. Harvard Int. Law J. 52, 503–556.
Deacon, R.T., 1994. Deforestation and the rule of law in a cross-section of countries. Land
Econ. 70 (4), 414–430. https://doi.org/10.2307/3146638.
Soumik Saha: Conceptualization, Methodology, Formal analysis, Deb, S., Ahmed, A., Datta, D., 2014. An alternative approach for delineating eco sensitive
Writing – original draft, Writing – review & editing, Visualization. zones around a wildlife sanctuary applying geospatial techniques. Environ. Monit.
Sumana Bhattacharjee: Conceptualization, Supervision, Formal anal Assess. 186, 2641–2651. https://doi.org/10.1007/s10661-013-3567-7.
Deb, S., Debnath, M.K., Chakraborty, S., et al., 2018. Anthropogenic impacts on forest
ysis, Writing – review & editing. Pravat Kumar Shit: Supervision, land use and land cover change: modelling future possibilities in the Himalayan
Formal analysis, Writing – review & editing. Nairita Sengupta: Formal Terai. Anthropocene 21, 32–41. https://doi.org/10.1016/j.ancene.2018.01.001.
analysis, Writing – review & editing. Biswajit Bera: Conceptualization, Dell’Angelo, J., D’Odorico, P., Rulli, M.C., Marchand, P., 2017. The tragedy of the
grabbed commons: coercion and dispossession in the global land rush. World Devel
Methodology, Formal analysis, Writing – original draft, Writing – review 92, 1–12. https://doi.org/10.1016/j.worlddev.2016.11.005.
& editing, Visualization. Devasena, C.L., 2014. Comparative analysis of random forest, REP tree and J48
classifiers for credit risk prediction. Inter. J. Comp. Appl. 30–36.
Dey, S.C., 1991. Depredation by wildlife in the fringe areas of North Bengal forests with
Declaration of interest statement
special reference to elephant damage. Indian For. 117, 901–908. https://doi.org/
10.36808/if/1991/v117i10/8731.
The authors declare that they have no known competing financial Dhingra, S., Kumar, D., 2019. A review of remotely sensed satellite image classification.
interests or personal relationships that could have appeared to influence Int. J. Electr. Comput. Eng 9 (2088-8708). http://doi.org/10.11591/ijece.v9i3.pp
1720-1731.
the work reported in this paper. District Census Handbook Koch Bihar, 2011. Census of India, West Bengal, Series-20 Part
XII-B, Village and Town Wise Primary Census Abstract. Directorate of Census
References Operations, West Bengal.
Dlamini, W.M., 2016. Analysis of deforestation patterns and drivers in Swaziland using
efficient Bayesian multivariate classifiers. Model. Earth Syst. Environ. 2 (4), 1–14.
Abe, S., 2010. Support Vector Machines for Pattern Classification. Springer, New York https://doi.org/10.1007/s40808-016-0231-6.
NY USA. FAO (Food and Agriculture Organisation). 2015. http://faostat.fao.org/. (Access date:
Babu, J.S., Sudha, D.T., 2018. Analysis and detection of deforestation using novel 12-4-2020).
remote-sensing technologies with satellite images. In: IADS International Conference Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M.A., Strachan, R., 2014. Hybrid decision
on Computing. Communications & Data Engineering (CCODE). Available at SSRN. tree and Bayes classifiers for multi-class classification tasks. Expert. Syst. Appl. Int. J.
https://ssrn.com/abstract=3187151. 41, 1937–1946. https://doi.org/10.1016/j.eswa.2013.08.089.
Bax, V., Francesconi, W., 2018. Environmental predictors of forest change: an analysis of Fenton, N., Neil, M., 2013. Risk Assessment and Decision Analysis with Bayesian
natural predisposition to deforestation in the tropical Andes region. Peru. Appl. Networks. CRC Press New York.
Geogr. 91, 99–110. https://doi.org/10.1016/j.apgeog.2018.01.002. Fontan, J., 1994. Changements globaux et de´veloppement. Nat. Sci. Soc. 2 (2), 143–152.
Bera, B., Bhattacharjee, S., Sengupta, N., Saha, S., 2021b. Dynamics of deforestation and https://doi.org/10.1051/nss/19940202143.
forest degradation hotspots applying geo-spatial techniques, apalchand forest in Fukuda, S., Baets, B.D., Waegeman, W., Verwaeren, J., Mouton, A.M., 2013. Habitat
terai belt of himalayan foothills: conservation priorities of forest ecosystem. Remote prediction and knowledge extraction for spawning European grayling (Thymallus
Sens. Appl.: Soc. Environ. 22, 100510 https://doi.org/10.1016/j.rsase.2021.100510. thymallus L.) using a broad range of species distribution models. Environ. Model.
Bera, B., Saha, S., Bhattacharjee, S., 2020a. Forest cover dynamics (1998 to 2019) and Softw. 47, 1–6. https://doi.org/10.1016/j.envsoft.2013.04.005.
prediction of deforestation probability using binary logistic regression (BLR) model GFRA (Global Forest Resources Assessment). 2015. FAO of UN (Retrieved from).
of Silabati Watershed, India. Trees Forests People 2, 100034. https://doi.org/ http://www.fao.org/forest-resources-assessment/documents/en/.
10.1016/j.tfp.2020.100034. Ghosh, C., Ghatak, S., Biswas, K., Das, A.P., 2021. Status of tree diversity of the Jaldapara
Bera, B., Saha, S., Bhattacharjee, S., 2020b. Estimation of forest canopy cover and forest National Park in West Bengal, India. Trees,Forests People 3, 100061. https://doi.
fragmentation mapping using landsat satellite data of Silabati River Basin (India). org/10.1016/j.tfp.2020.100061.
KN. J. Cartogr. Geogr. Inf. https://doi.org/10.1007/s42489-020-00060-1. Ghosh, C., Paul, T.K., Das, A.P., 2013. Rediscovery of Hibiscus fragrans roxburgh
Bera, B., Shit, P.K., Sengupta, N., Saha, S., Bhattacharjee, S., 2021a. Susceptibility of (Malvaceae) from Jaldapara National Park in Duars of West Bengal, India. Pleione 7,
deforestation hotspots in Terai-Dooars belt of Himalayan Foothills: a comparative 531–537.
analysis of VIKOR and TOPSIS models. J King Saud Univ - Comput Inf Sci. https:// Gibson, L., Lee, T., Koh, L., et al., 2011. Primary forests are irreplaceable for sustaining
doi.org/10.1016/j.jksuci.2021.10.005. tropical biodiversity. Nature 478, 378–381. https://doi.org/10.1038/nature10425.
Bhargavi, P., Jyothi, S., 2009. Applying naive bayes data mining technique for Gigović, L., Pourghasemi, H.R., Drobnjak, S., Bai, S., 2019. Testing a new ensemble
classification of agricultural land soils. Int. J. Comput. Sci. Netw. Secur. 9, 117–122. model based on SVM and random forest in forest fire susceptibility assessment and
Bhattacharya, M., 2013. Machine learning for bioclimatic modelling. Int. J. Adv. its mapping in Serbia’s Tara National Park. Forests 10 (5), 408. https://doi.org/
Comput. Sci. Appl. 4 (2), 1–8. https://doi.org/10.14569/IJACSA.2013.040201. 10.3390/f10050408.
Bhattacharyya, M.K., Padhy, P.K., 2013. Forest and wildlife scenarios of northern West Grinand, C., Rakotomalala, F., Gond, V., Vaudry, R., Bernoux, M., Vieilledent, G., 2013.
Bengal, India: a review. Int. Res. J. Biol. Sci. 2, 70–79. http://www.isca.in/IJBS/A Estimating deforestation in tropical humid and dry forests in Madagascar from 2000
rchive/v2/i7/15.ISCA-IRJBS-2013-044.pdf. to 2010 using multi-date Landsat satellite images and the random forests classifier.
Breiman, L., 2001. Random forest. Mach. Learn. 45, 5–32. Remote Sens. Environ. 139, 68–80. https://doi.org/10.1016/j.rse.2013.07.008.
Chai, T., Draxler, R., 2014. Root mean square error (RMSE) or mean absolute error Hansen, M.C., Potapov, P.V., Moore, R., et al., 2013. High-resolution global maps of 21st-
(MAE)?. Geosci. Model Dev 7 (1), 1247–1250. https://doi.org/10.5194/gmdd-7- century forest cover change. Science 342, 850–853.
1525-2014. Hsieh, W.W., 2009. Machine Learning Methods in the Environmental sciences: Neural
Chamling, M., Bera, B., 2020a. Likelihood of elephant death risk applying kernel density Networks and Kernels. Cambridge University Press, Cambridge, UK. https://doi.org/
estimation model along the railway track within biodiversity hotspot of 10.1017/CBO9780511627217.
Bhutan–Bengal Himalayan Foothill. Model. Earth Syst. Environ. 6, 2565–2580. Huang, Y., Zhao, L., 2018. Review on landslide susceptibility mapping using support
https://doi.org/10.1007/s40808-020-00849-z. vector machines. Catena 165, 520–529. https://doi.org/10.1016/j.
Chamling, M., Bera, B., 2020b. Spatio-temporal patterns of land use/land cover change catena.2018.03.003.
in the Bhutan– Bengal foothill region between 1987 and 2019: study towards Jenks, G., 1967. The data model concept in statistical mapping. Int. Yearb. Cartogr. 7,
geospatial applications and policy making. Earth Syst. Environ. 4, 117–130. https:// 186–190.
doi.org/10.1007/s41748-020-00150-0. Johnston, R., Jones, K., Manley, D., 2018. Confounding and collinearity in regression
Chamling, M., Bera, B., Sarkar, S., 2021. Geospatial environmental modeling of forest analysis: a cautionary tale and an alternative procedure, illustrated by studies of
declining trend in eastern Himalayan biodiversity hotspot region. Forest Resources British voting behaviour. Qual. Quant. 52 (4), 1957–1976. https://doi.org/10.1007/
Resilience and Conflicts, pp. 417–433. https://doi.org/10.1016/B978-0-12-822931- s11135-017-0584-6.
6.00030-7. Kumar, R., Nandy, S., Agarwal, R., Kushwaha, S.P.S., 2014. Forest cover dynamics
Chen, W., Xie, X., Wang, J., et al., 2017. A comparative study of logistic model tree, analysis and prediction modelling using logistic regression model. Ecol. Indic. 45,
random forest, and classification and regression tree models for spatial prediction of 444–455. https://doi.org/10.1016/j.ecolind.2014.05.003.
landslide susceptibility. Catena 151, 147–160. https://doi.org/10.1016/j. Liu, J., Mooney, J., Hull, V., et al., 2015. Systems integration for global sustainability.
catena.2016.11.032. Science 347, 1258832. https://doi.org/10.1126/science.1258832.
Liu, Z., Peng, C., Work, T., Candau, J.N., DesRochers, A., Kneeshaw, D., 2018.
Application of machine-learning methods in forest ecology: recent progress and
14
future challenges. Environ. Rev. 26 (4), 339–350. https://doi.org/10.1139/er-2018- forest, ensemble rotational forest and REP Tree: a case study at the Gumani River
0034. Basin, India. Sci. Total Environ. 730, 139197 https://doi.org/10.1016/j.
Margono, B.A., Potapov, P.V., Turubanova, S., Stolle, F., Hansen, M.C., 2014. Primary scitotenv.2020.139197.
forest cover loss in Indonesia over 2000–2012. Nat. Clim. Change 4, 730–735. Shukla, G., Pala, N.A., Chakravarty, S., 2017. Quantification of organic carbon and
https://doi.org/10.1038/nclimate2277. primary nutrients in litter and soil in a foothill forest plantation of eastern Himalaya.
Masiero, M., Secco, L., Pettenella, D., Brotto, L., 2015. Standards and guidelines for forest J. For. Res. 28, 1195–1202. https://doi.org/10.1007/s11676-017-0394-7.
plantation management: a global comparative study. For. Policy Econ. 53, 29–44. Solórzano, J.V., Gao, Y., 2022. Forest disturbance detection with seasonal and trend
https://doi.org/10.1016/j.forpol.2014.12.008. model components and machine learning algorithms. Remote Sens. 14, 803. https://
Mayfield, H., 2015. Making the Most of Machine Learning and Freely Available Datasets: doi.org/10.3390/rs14030803.
A Deforestation Case Study. (PhD Thesis). University of Queensland. https://doi. Stibig, H.J., Achard, F., Carboni, S., Raši, R., Miettinen, J., 2014. Change in tropical
org/10.13140/RG.2.1.2307.8647. forest cover of Southeast Asia from 1990 to 2010. Biogeosciences 11 (2), 247–258.
Mayfield, H., Smith, C., Gallagher, M., Hockings, M., 2017. Use of freely available https://doi.org/10.5194/bg-11-247-2014.
datasets and machine learning methods in predicting deforestation. Environ. Model. Suleiman, M.S., Wasonga, V.O., Mbau, J.S., et al., 2017. Non-timber forest products and
Soft. 87, 17–28. https://doi.org/10.1016/j.envsoft.2016.10.006. their contribution to households income around Falgore Game Reserve in Kano.
Mountrakis, G., Im, J., Ogole, C., 2011. Support vector machines in remote sensing: a Nigeria. Ecol. Process 6, 23. https://doi.org/10.1186/s13717-017-0090-8.
review. ISPRS J. Photogramm. Remote Sens. 66, 247–259. https://doi.org/10.1016/ Vapnik, V.N., 1995. The Nature of Statistical Learning Theory. Springer Science &
j.isprsjprs.2010.11.001. Business Media, Berlin, Germany.
Murali, K.S., Murthy, I.K., Ravindranath, N.H., 2002. Joint forest management in India Verburg, P.H., Erb, K.H., Mertz, O., Espindola, G., 2013. Land system science: between
and its ecological impacts. Environ. Manag. Health 13 (5), 512–528. https://doi.org/ global challenges and local realities. Curr. Opin Environ Sustain. 5, 433–437.
10.1108/09566160210441807. https://doi.org/10.1016/j.cosust.2013.08.001.
Pradhan, B., 2013. A comparative study on the predictive ability of the decision tree, West Bengal Forest Department. 2016. Available from URL: http://www.westbengalfores
support vector machine and neuro-fuzzy models in landslide susceptibility mapping t.gov.in/index.html. (Accessed date: 17-11-2020).
using GIS. Comput. Geosci. 51, 350–365. https://doi.org/10.1016/j. Wu, X., Ren, F., Niu, R., 2014. Landslide susceptibility assessment using object mapping
cageo.2012.08.023. units, decision tree, and support vector machine models in the Three Gorges of
Puig, H., 2000. Diversité spécifique et déforestation: l’exemple des forêts tropicales China. Environ. Earth Sci. 71, 4725–4738. https://doi.org/10.1007/s12665-013-
humides du Mexique. Bois. Forets. Des. Tropiques 268, 41–55. https://doi.org/ 2863-4.
10.19182/bft2001.268.a20102. Xu, C., Xu, X., Dai, F., Saraf, A.K., 2012. Comparison of different models for susceptibility
Rahmati, O., Tahmasebipour, N., Haghizadeh, A., Pourghasemi, H.R., Feizizadeh, B., mapping of earthquake triggered landslides related with the 2008 Wenchuan
2017. Evaluation of different machine learning models for predicting and mapping earthquake in China. Comput. Geosci. 46, 317–329. https://doi.org/10.1016/j.
the susceptibility of gully erosion. Geomorphology 298, 118–137. https://doi.org/ cageo.2012.01.002.
10.1016/j.geomorph.2017.09.006. Yanai, A.M., Fearnside, P.M., de Alencastro Graça, P.M.L., Nogueira, E.M., 2012.
Rogan, J., Franklin, J., Stow, D., Miller, J., Woodcock, C., Roberts, D., 2008. Mapping Avoided deforestation in Brazilian amazonia: simulating the effect of the juma
land-cover modifications over large areas: a comparison of machine learning sustainable development. Reserve. For. Ecol. Manag. 282, 78–91. https://doi.org/
algorithms. Remote Sens. Environ. 112 (5), 2272–2283. https://doi.org/10.1016/j. 10.1016/j.foreco.2012.06.029.
rse.2007.10.004. Yeon, Y.K., Han, J.G., Ryu, K.H., 2010. Landslide susceptibility mapping in Injae, Korea,
Saha, S., Saha, M., Mukherjee, K., Arabameri, A., Ngo, P.T.T., Paul, G.C., 2020. using a decision tree. Eng. Geol. 116, 274–283. https://doi.org/10.1016/j.
Predicting the deforestation probability using the binary logistic regression, random enggeo.2010.09.009.
15

1 s2.0 S2667378922000153 Main

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

1 s2.0 S2667378922000153 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S2667378922000153 Main

Uploaded by

Copyright:

Available Formats

14 (2022) 200077

Contents lists available at ScienceDirect

Resources, Conservation & Recycling Advances

Deforestation probability assessment using integrated machine learning

1. Introduction socio-economic factors (settlement, roads, infrastructure, population

Available online 31 March 2022

Fig. 1. Geographical location of the study area

spatial approaches and machine learning approaches) (Mayfield, 2015). Table 1

Table 2 quantify the severity of multi-collinearity (Table 3). Variance inflation

tation. Satellite image and Shuttle Radar Topography Mission (SRTM)

In this research multi-collinearity test has been applied to avoid the

Table 6 significant difference between the models based on Z and P value

SVM-NB -14.146b P<0.05 5. Discussion

You might also like