1. Introduction
Land use has been one of the main drivers of landscape transformations over the years, with human activities such as agriculture, urbanization, and mining exerting significant pressure on natural ecosystems. In this context, remote sensing technologies have emerged as crucial tools for mapping and monitoring these changes, providing essential data for the sustainable management of natural resources.
Areas under the influence of mining are regions densely occupied by enterprises or exploited lands, often marked by intense conflicts between economic interests and environmental protection [
1]. According to Buczyńska et al. [
2], the impacts of mineral extraction from surface or underground mining methods include continuous deformations, large-scale land leveling, groundwater depletion, soil and groundwater contamination, and dust pollution. All these effects can negatively influence the preservation of vegetation within and around the mining area. In this context, robust analysis of social and environmental data at local and regional scales is essential for regulators and mining companies to identify, monitor, mitigate, and sustainably manage the environmental and socioeconomic impacts of mining [
3].
Mining has been one of the main landscape transformation agents over the years. Estimates suggest that mining and quarrying activities have altered approximately 0.3 to 0.8 million square kilometers of land worldwide, and this trend continues to grow [
4,
5]. However, existing research highlights gaps related to understanding the specific impacts of mining in semi-arid areas, where the effects of soil degradation, water contamination, and deforestation are exacerbated by adverse climatic conditions [
6,
7]. In the case of gold mining near protected areas in South America, significant deforestation has occurred, driven by the rise in gold prices during the global economic crisis of 2008 [
8,
9,
10].
In Northeastern Brazil, mining activities have been identified as a catalyst for environmental degradation, affecting soil quality, water resources, biodiversity, and human health [
11,
12]. Changes associated with Artisanal and Small-Scale Gold Mining in the landscape (roads and airstrips) typically include deforestation to access gold deposits and settlements [
13]. Unregulated mineral processing also leads to soil and water body pollution by heavy metals, especially increased mercury levels [
14,
15,
16].
The National Mining Agency (ANM) highlights the importance of the region for mineral production in Brazil, with Pernambuco being a small center for gold exploration [
11]. Brazilian legislation, including the National Environmental Policy (Law No. 6.938/81), the Mining Code (Decree-Law No. 227/1967), and Federal Decree No. 97.632/1989, which regulates the Degraded Area Recovery Plan (PRAD), emphasizes the need for responsible mining practices to minimize negative impacts on the environment and local communities [
17].
Mining activities impact land cover, vegetation, and soil properties, requiring effective monitoring approaches [
18]. The growing interest in monitoring illegal mining activities through remote sensing (RS) is a response to the increasing environmental and socioeconomic impacts these activities pose globally [
19]. Camalan et al. [
20] presented a socio-environmental approach to unregulated mining in various ecosystems, emphasizing the importance of RS techniques in mitigating these environmental impacts.
The environmental impacts of gold mining, such as water pollution and land degradation, are well-detected using RS data and techniques. Therefore, their use provides substantial benefits for detecting, mapping, and monitoring gold mining activities and their effects, especially those associated with local mining [
21].
Images from different satellites with medium and high spatial resolutions have been used to identify various mining activities and their environmental impacts on the Earth’s surface. According to Shikhov et al. [
7], optical images are analyzed using multiple approaches such as supervised and unsupervised classification, spectral indices, time-series analysis, and several machine learning algorithms, including k-Nearest Neighbors (kNN), artificial neural networks (ANNs), decision trees (DTs), support vector machines (SVMs), random forests (RFs), and classification and regression trees. According to Lua and Weng (2007), the results of soil surface mapping are influenced not only by the adequacy of the images but also by the correct choice of processing and classification methods.
Based on the results of an extensive literature review, Song et al. [
22] presented progress in RS monitoring research regarding land use and land cover changes in mining areas. The authors focused on the application and perspectives of RS techniques in the context of biodiversity ecological environment monitoring, highlighting aspects related to landscape structure, vegetation changes, soil environment, surface conditions, and atmospheric environment in mining areas.
RS technologies are widely used to identify natural features or physical objects on the Earth’s surface, utilizing various spatial, temporal, spectral, and resolution datasets [
23,
24,
25,
26], serving as effective sources for identifying the environmental impacts of gold mining, such as water contamination and soil degradation [
21]. These technologies offer significant advantages for detecting, mapping, and monitoring gold mining activities and their effects [
27,
28,
29].
To overcome the limitations of spatiotemporal frequency in land use and land cover surveys, the availability of optical images from PlanetScope Planet Labs’ constellation of nanosatellites emerged in 2016. The PlanetScope constellation, consisting of more than 180 CubeSats in sun-synchronized orbits, capable of capturing multispectral images with a resolution ranging from 3.7 to 4.1 m depending on altitude [
30], has a daily revisit, making it essential for the immediate detection of land changes as well as monitoring the expansion or maintenance of existing activities. This network has the unique capability of daily capturing free images of the entire planet, achieving an impressive coverage of up to 200,000,000 km
2 [
30].
The Normalized Difference Vegetation Index (NDVI) [
31] and the Normalized Difference Water Index (NDWI) [
32] can, respectively, provide information on vegetation and water presence in mining areas. Several studies over the past five years have demonstrated the potential of the NDVI and NDWI spectral indices for mapping and monitoring land use and land cover in mining areas. Padró et al. [
33] used high-resolution multispectral images acquired with an Unmanned Aerial System (UAV) and Soil Adjusted Vegetation Index (SAVI), Modified Soil Adjusted Vegetation Index (MSAVI), NDVI, and NDWI indices to evaluate vegetation development in a restored limestone quarry. Nascimento et al. [
34] developed a systematic image analysis approach based on geographic objects (GEOBIA) to map revegetated areas and quantify land use and land cover changes in open-pit mines in the Carajás/Amazon region/Brazil from high spatial resolution satellite images (GeoEye, WorldView-3, and IKONOS) from 2011 to 2015 and the NDVI and NDWI spectral indices. Stančič et al. [
35] used Landsat 8 and Sentinel-2 data to monitor the Soča River area in Slovenia using SAM (Spectral Angle Mapper) and fuzzy SSMA (Spectral Signal Mixture Analysis) classification methods, additionally introducing NDVI, NDWI, and other complementary indices into the classification algorithm. McKenna et al. [
36] presented an extensive RS literature review, focusing on the ecological rehabilitation of mining sites.
RS data are used to observe three main aspects related to gold mining: deforestation or changes in land cover, water pollution from mining near rivers, detection of turbidity levels in river channels, and estimating the presence of mercury [
37,
38,
39]. Fonseca et al. [
10] analyzed land use and cover changes in gold ore areas in the Brazilian Amazon rainforest using Landsat images and the RF classifier. Shikhov et al. [
7] evaluated the extent of soil degradation caused by gold mining in the Magadan region of China and its changes in the 21st century, based on Landsat/Sentinel-2 satellite data. Zaki et al. [
40] used the kNN machine learning algorithm to estimate mineral resources (predicting the gold grade in the Quartz Ridge area) and analyze the impact of its unregulated extraction on land use and land cover.
The integration of RS techniques and machine learning algorithms, such as kNN, combined with spectral sensitivity studies of targets using indices, emerge as essential tools in monitoring land cover in mining areas, enabling spatiotemporal analysis with precision and effectiveness [
41,
42]. This methodological approach allows for identifying and quantifying changes in vegetation and soil moisture, providing crucial data to assess the environmental impacts resulting from mining activities. Pacheco et al. [
43] demonstrated the applicability of this technique in mapping areas affected by forest fires in Portugal, efficiently using kNN to classify Landsat-8, Sentinel-2, and Terra satellite images. Noi and Kappas [
44] highlight that, although kNN may be slightly more sensitive to training sample size compared to other algorithms like SVM, it still presents high Overall Accuracy, especially when the sample size is sufficient.
In this context, this study proposes a decision support model for sustainable monitoring of mining activities in semi-arid regions of Brazil, offering an innovative integrated analysis of sensing and machine learning using high-resolution orbital images to spatially analyze environmental variability within and around areas impacted by mining. The approach combines the kNN classifier with spectral indices such as NDVI and NDWI, derived from PlanetScope satellite images, covering the period from 2018 to 2023. This model aims to address existing gaps in the literature by offering an integrated analysis that tackles both the socio-environmental impacts and the efficiency in monitoring landscape transformations in semi-arid scenarios, with an emphasis on the Serita-Cedro region. However, it is noteworthy that the methodology proposed in this study can be implemented to evaluate the spatiotemporal behavior of land cover in other mining regions with arid and/or semi-arid climatic characteristics.
2. Materials and Methods
The methodology adopted to identify changes in the landscape of the semi-arid region of Pernambuco, caused by the presence of mining areas, consisted of the following steps: data acquisition, processing, and results generation (
Figure 1).
In the first step, data acquisition was carried out with the selection of scenes, considering factors such as broad coverage, absence of clouds, periods of low rainfall incidence, and availability of data sharing via cloud platforms. Additionally, as the vegetation of the Caatinga biome, present in the study area, is sensitive to rainfall [
45], it is necessary to analyze the response of spectral indices concerning vegetation, considering the effects of precipitation. In this regard, precipitation data for the study region during the image acquisition period was obtained from the Pernambuco Water and Climate Agency (APAC).
Next, land use and land cover mapping were developed. Initially, the spectral indices were calculated, followed by class training. At this stage, due to factors such as the sensor’s spatial resolution and the diversity of land uses in the area, four classes were chosen: water, vegetation, bare soil, and urban patches (built-up areas). The vegetation class was grouped into herbaceous and shrub vegetation, while the water class included both watercourses and water bodies.
As a result, the landscape changes caused by mining infrastructure were identified through thematic maps and quantified using class extraction and separability. The classification accuracy relative to the image was established through evaluation parameters: Overall Accuracy (OA), Kappa index, Omission Errors (OEs), and Commission Errors (CEs).
2.1. Study Area
The investigated region (
Figure 2) is a polygon designated for gold mining activities, covering an area of 459.33 km
2, with a 6 km buffer zone from the Serrita-Cedro Project, which is part of the National Program for the Study of Mining Districts, conducted by the National Department of Mineral Production (DNPM), in the semi-arid region of the Brazilian state of Pernambuco.
This area, in its first phase (1994–1995), focused on the investigation of gold mineralization, encompassing an area of 580 km
2 [
46]. Over the years, the area has been subject to exploration, with increased investment since 2020 by the mining company Trilha Gold Capital (TGC). According to the Brazilian Mining Institute (IBRAM), all studies conducted in the Serrita Project follow the standards of the Australian Joint Ore Reserves Committee (JORC) code, along with environmental regulations and laws from the relevant authorities, ensuring maximum credibility and accuracy in the research activities in the area [
47].
However, gold mining is often accompanied by soil and vegetation destruction, landscape fragmentation, and biodiversity loss, as well as the disruption of ecosystem services flows [
48]. Additionally, it stands as a significant driver of deforestation, unique in the severity of its impacts, leaving behind a highly altered landscape [
36].
The climate of the region is semi-arid and hot, classified as Bshw according to Köppen (
https://www.gloh2o.org/koppen/, accessed on 1 April 2024), with a distinct rainy season (from February to May) and dry periods (from June to January). The vegetation cover consists of xerophytic Caatinga, characterized by heterogeneous vegetation, whose vegetative vigor is sensitive to precipitation [
45]. The terrain is hilly, with elevations averaging around 480 m above sea level.
Gold mining in the semi-arid region of Pernambuco, Brazil, is predominantly conducted by corporate companies, such as the Serrita-Cedro Project, managed by Trilha Gold Capital. Despite these regulations, mining activities continue to be associated with environmental degradation, such as deforestation and significant land use changes. The mining sector faces the challenge of aligning its production processes with environmental and social sustainability requirements. As noted by IBRAM [
47], incorporating responsible practices is increasingly essential for maintaining competitiveness in the market, given the growing environmental and social demands from investors and consumers.
In this context, the use of satellite images combined with machine learning models presents a valuable tool for better understanding landscape changes in areas associated with gold mining practices [
13].
2.2. Materials
2.2.1. Satellite Data
Two scenes of orbital images from the PlanetScope Instruments mission, from 2018 and 2023, were used, both acquired in October, a period of low rainfall in the region [
49]. The analysis of the Caatinga’s vegetation cover is more effective during the dry season, as the vegetation is sensitive to minimal moisture, which can cause false positives during rainy periods [
45,
50]. The PlanetScope constellation, managed by Planet Labs, consists of over 130 satellites that offer daily global coverage [
51]. The images provided by PlanetScope (PS) include four spectral bands: blue, green, red, and near-infrared (NIR), with a spatial resolution of 3 m and a radiometric resolution of 16 bits [
52]. The images from PSB.SD and PS2 sensors used in this study, along with their characteristics and acquisition dates (
https://www.planet.com/, accessed on 4 March 2024), are presented in
Table 1.
Iqbal et al. [
53], using the kNN model with PlanetScope images to map native and invasive species distributions in two forest reserves in Pakistan, highlighted the images’ good performance in identifying targets compared to the Sentinel-2 MSI sensor.
2.2.2. Hydrological Data
The Caatinga vegetation has shown sensitivity to available rainfall [
45,
50,
54]. In this context, analyzing the relationship between the Caatinga vegetation cover and precipitation becomes a key element for the accuracy of the results obtained when mapping land use and occupation in the study region. During October 2018, two rainy days were recorded (17 October 2018 and 18 October 2018), totaling 69.9 mm for the month and 818.9 mm for the year. In 2023, there were no rainy days, with monthly rainfall of 0 mm and 733 mm annually [
55].
2.3. Methods
2.3.1. Spectral Indices
After pre-processing the PlanetScope satellite images, the NDVI and NDWI spectral indices were calculated based on their respective operations, and the results were analyzed using the Jeffries–Matusita distance (JMD) [
56].
NDVI serves as an effective indicator of active plant biomass or, otherwise, vegetation vitality. Developed by Rouse et al. [
31], the NDVI ranges from −1 to 1, obtained using Equation (1):
where NIR is the reflectance in the near-infrared band, and RED is the reflectance in the visible red band.
This index helps differentiate vegetated areas from other land covers, such as artificial ones, and allows for the assessment of the overall vegetation condition [
57]. Additionally, NDVI enables the demarcation and monitoring of vegetation zones, as well as the recognition of any anomalies or changes in the observed area. This indicator is useful for monitoring seasonal variations in vegetation, though its effectiveness depends on surface reflection characteristics [
58].
NDWI, using the green and near-infrared bands, is an efficient indicator for monitoring the presence and distribution of water in terrestrial and aquatic surfaces. This index is particularly effective in identifying water bodies and assessing moisture in vegetation, contributing to irrigation and water resource management studies [
59]. NDWI is also helpful in detecting flooded areas and analyzing soil water saturation, making it a valuable tool for environmental and agricultural planning [
32]. The NDWI ranges from −1 to 1 and is calculated using Equation (2):
where NIR is the reflectance in the near-infrared band, and Green is the reflectance in the visible green band.
2.3.2. Jeffries–Matusita Distance
JMD is a statistical metric used to evaluate the separability between classes in RS data. JMD is particularly useful for quantifying the distinction between probability distributions of classes, which is critical in multispectral image classification [
60]. JMD is based on Bhattacharyya distance, which measures the overlap between two statistical distributions, and is transformed to the range [0, 2] [
56]. The formula for calculating JMD between two classes is given by Equation (3):
where B is the Bhattacharyya index (measure) that quantifies the overlap between two probability distributions. This measure is based on the means and variances of the characteristics of each class given by Equation (4).
where µ represents the mean, ∑ the average covariance matrix, and |∑| the determinant of the covariance matrix for each class. High JMD values indicate greater separability between classes, while lower values suggest significant overlap, making it difficult to distinguish between them [
60].
2.3.3. k-Nearest Neighbors (kNN) Classification
The kNN classification algorithm is a simple yet powerful method used for classification and regression. Introduced by Cover and Hart [
61], kNN operates on the principle that similar samples tend to be close to each other in feature space, as highlighted by James et al. [
62]. This algorithm identifies the k nearest neighbors of an unknown sample within the training set, assigning the sample to the most common class (or the average of the responses) among these neighbors.
The distance between samples, fundamental to the operation of kNN, can be calculated in several ways. The most common is the Euclidean distance, given by Equation (5):
Equation (5) calculates the distance between two samples, p and q, each with m features, illustrating how the algorithm navigates the multidimensional space. The choice of k is a critical aspect that directly influences the algorithm’s performance. A very small k may make the model overly sensitive to data noise, while a very large k may cause it to overlook class distribution nuances. It is recommended to experiment with various k values and potentially use validation methods, such as cross-validation, to determine the optimal k.
Cross-validation, as described by Kohavi [
63], is a technique used to assess a statistical model’s generalization ability and to tune hyperparameters, such as k in kNN. The most common cross-validation method is k-fold, which divides the dataset into k subsets. The model is trained k times, each time using k-1 subsets for training and the remaining subset for testing. The model’s performance is then evaluated by averaging the results obtained in each of the k iterations.
kNN, along with cross-validation, offers a robust approach to data classification, leveraging the algorithm’s simplicity and the efficacy of cross-validation to adjust hyperparameters and assessing the model’s generalization ability for new data.
The choice of the kNN algorithm over the previously mentioned models (ANN, DT, RF, and SVM) was based on its simplicity of implementation, high accuracy in scenarios with balanced data, and results previously reported in the literature [
61,
62,
63]. Recent studies [
40,
64,
65,
66] have demonstrated that in land use and land cover analyses in mining regions, kNN achieves comparable or superior performance in Overall Accuracy and Kappa when compared to more complex methods, particularly when applied to datasets with a limited number of training samples, as in this study. Furthermore, the interpretative nature of kNN enables a more direct analysis of the impact of spectral distance on results, facilitating the identification of specific challenges, such as separability between classes with similar spectral characteristics, for example, vegetation and urban area.
2.3.4. Samples and Training
The training samples were extracted from both 2018 and 2023 images, generating two training sets, each representing four classes of interest: water, urban area, vegetation, and bare soil. For this initial process, the open-source software QGIS, version 3.34.2-Prizren [
67], was used, where the training preparation involved collecting several sample polygons for each class. The selection was made through the manual analysis of composite images (RGBs). After visual interpretation, the masks were saved in shapefile format, allowing them to be accessed and processed in the subsequent stage.
The definition of land use and land cover classes (water, urban area, vegetation, and bare soil) was based on the main features observed in the study area, which is characterized by mining activities in semi-arid regions [
47]. Although other categories, such as croplands, may be common in some semi-arid regions, they did not stand out significantly in the investigated area, as they were often confused with native vegetation or bare soil due to spectral similarity. Moreover, the spatial resolution of the images used (3 m) limited the ability to identify subtle differences between small-scale croplands and the herbaceous or shrub vegetation of the Caatinga.
In this study, the data were partitioned for training and testing the kNN model, with 80% of the data used for training and 20% for testing. This partitioning ratio was selected considering the moderate size of the dataset, enabling the model to effectively learn the class characteristics while ensuring a robust evaluation of its performance. The achieved accuracy of 0.99 was based on this partition, which may be influenced by the size and quality of the data. However, this division is considered appropriate for the context of the study.
Training was conducted using the R programming 4.4.2 language through the RStudio software 2024.04.7 [
68]. The two sets of samples for the four classes were used with cross-validation, where the dataset was randomly divided into 10 subsets (or “folds”), and the model was trained 10 times, each time using 9 of the subsets for training and the remaining subset for testing, for each sample set.
2.3.5. Accuracy Analysis of the Classification
The use of CE and OE metrics is essential for evaluating model errors in RS systems, allowing for a more detailed and precise analysis of the model’s ability to correctly identify positive cases and avoid incorrectly classifying positives. The CE metric is calculated by the proportion of false positives relative to the total number of events classified as positive, while the OE metric is calculated by the proportion of false negatives relative to the total number of actual positive events. OE can be calculated by Equation (6):
Subsequently, CE can be calculated by Equation (7):
These parameters allow monitoring of prediction accuracy and are widely used in RS literature, as highlighted by Sano et al. [
69] and Tejado-Ramos et al. [
70].
The Overall Accuracy (OA) and the Kappa Index were also used as parameters for analyzing the thematic accuracy of the mapping.
OA is used to measure the model’s prediction accuracy and is the ratio of correctly classified samples to the total number of samples [
71]. The OA can be calculated using Equation (8):
where X
ii represents the number of correctly classified samples along the diagonal, and N is the total number of samples. The higher the OA value, the better the overall prediction accuracy of the model.
The Kappa Coefficient is used to measure classification accuracy and is calculated according to Equation (9).
where P
0 is the proportion of correctly simulated pixels, P
p is the proportion of correctly predicted pixels in an ideal situation, and P
C is the proportion of correctly predicted pixels in a random situation. The closer the kappa coefficient is to 1, the better the classification result matches the actual situation [
71].
2.3.6. Quantitative Analysis of Classified Areas
After the image classification process, each pixel was assigned a value to one of the four predefined classes, where the value 1 corresponds to water, 2 to urban area, 3 to bare soil, and 4 to vegetation. The quantitative analysis was conducted by summing the pixels of each class for the analyzed years. The PlanetScope satellite images have a spatial resolution of 3 × 3 m, which implies an area of 9 m2 per pixel. To facilitate quantitative analysis, the total area of each class was converted from m2 to km2, allowing for a more accessible comparison of classified areas between the years 2018 and 2023.
3. Results
3.1. Monitoring of Land Use and Occupation
Thematic land use and occupation maps (
Figure 3) were generated through classification using the kNN algorithm. The classes analyzed were water, urban area, bare soil, and vegetation, allowing the identification of landscape changes between 2018 and 2023.
In
Figure 3, there is a noticeable increase in built-up areas proportional to a reduction in vegetated areas, especially in regions near natural water resources, indicating accelerated loss of riparian forests. On the other hand, bare soil areas did not show significant visible changes. The “urban area” class includes roofs of structures such as houses, warehouses, sports courts (with ceramic or metallic coverings), and paved roads like asphalt or cobblestone streets. The cartographic conventions for linear features such as roads were incorporated into the maps using the photointerpretation method on PlanetScope images, complemented by data from the Brazilian National Department of Transport Infrastructure (DNIT) (
https://servicos.dnit.gov.br/vgeo/, accessed on 4 April 2024). This approach allowed for the identification of roads in the study area with high precision, considering the 3 m spatial resolution of the analyzed images. The analysis revealed that several roads are closely related to mining areas, indicating a possible correlation between the expansion of road infrastructure and increased productive activities. This methodology revealed a significant expansion of the road network during the analyzed period, contributing directly to the growth of the built-up area identified in the maps. The inclusion of these features highlights the essential role of roads and other constructions in shaping the landscape, thereby explaining the changes observed in the urban class from 2018 to 2023.
By comparing the classifications from 2018 and 2023, a reduction of approximately 2.10% in vegetation over the five years was identified, as shown in
Figure 4.
Data extraction enabled the quantification of land use and occupation classes, as shown in
Figure 5.
In 2023, vegetation cover decreased by 9.68 km2, which is equivalent to 3.28% compared to the vegetation area in 2018 and 2.11% of the total area of 459.33 km2. Rainfall over the period studied fell considerably, which may have influenced changes in the Caatinga vegetation. This decrease was particularly noticeable in October 2023, when there was no precipitation. The lower availability of water may have reduced the natural regeneration capacity of vegetation, contributing to the decrease in vegetation cover, between 2018 and 2023. If the downward trend in precipitation continues, it is possible that the impact on vegetation will intensify, exacerbating the effects of anthropogenic activities, such as mining, and affecting the long-term health of vegetation.
Water areas also decreased by 0.44 km2, representing a 22.80% reduction compared to the area in 2018 and 0.09% of the total area. Bare soil areas showed a decrease of 20.31 km2, which corresponds to an 11.52% reduction compared to 2018 and 4.42% of the total area. Conversely, built-up areas increased by 30.43 km2, representing a 142.53% increase compared to 2018 and 6.62% of the total area.
Some areas classified as deforested under the influence of mining were identified as artisanal gold mining activities through Google Earth validation, supporting the accuracy of the classification results. In these regions, it was observed that farmers converted areas originally designated for agriculture and grazed into clear-cut zones with extensive excavations in search of gold. These actions are directly linked to the presence of the Serrita-Cedro Project in the region, which has drawn significant attention from informal gold miners. The conversion of agricultural and pastureland into mining sites has caused significant impacts on land use and land cover, along with environmental and social consequences, highlighting the need for continuous monitoring and proper regulation.
3.2. Separability Analysis
The separability between the land use and cover classes mapped by kNN was evaluated using the JMD.
Table 2 and
Table 3 show the JMD values for the years 2018 and 2023.
In 2018, the separability between water and vegetation, with NDVI (1.827), NDWI (1.953), and the NIR band (1.980), was high, indicating a good distinction between these classes. However, the separability between urban area and vegetation showed values below 1 for NDWI, suggesting less consistency in distinguishing these classes. Following the same evaluation pattern used for 2018, the separability between the classes for 2023 was assessed (
Table 3).
In 2023, the JMD values also indicated good separability for the water vs. vegetation classes, with an emphasis on NDWI (1.996). For urban area vs. vegetation, the separability was effective, with high values in the red band (1.900), NIR (1.815), NDVI (1.816), and NDWI (1.178). The separability between bare soil and vegetation remained high for all variables, with values above 1.823.
3.3. Accuracy of kNN Classification
The classification was evaluated using the accuracy parameters OA, Kappa index, OE, CE, and cross-validation, as shown in
Figure 6.
In 2018, the accuracy decreases continuously as k increases, going from approximately 0.9899 with k = 5 to 0.9894 with k = 9. This indicates that in 2018, the increase in the number of neighbors had a negative impact on the model’s performance, with k = 5 being the best performer. In 2023, there is a different trend, the accuracy peaks at k = 7, reaching approximately 0.9900, and then starts to decrease for higher values of k. This indicates that the choice of k = 7 was optimal in the 2023 scenario. Based on the classification defined by Landis and Koch [
72], these results suggest that the classification was not only accurate, but also consistent and reliable.
These results indicate good precision and agreement between the classification and the actual landscape. In this context, it is demonstrated that the model has precise classification potential, producing results that are very close to the actual or expected values. The narrow confidence interval suggests a high probability that the model’s actual accuracy is within this range, which is indicative of consistent results [
73].
Figure 7 presents a comparison between the classification using the kNN algorithm and an RGB composite, with the PlanetScope bands.
Through
Figure 7, it is possible to spatially identify the results obtained with the OA and Kappa index parameters; however, an analysis of OE and CE is necessary, as conducted in this study, to identify the presence of false positives and false negatives.
Table 4 presents the verification results of these parameters.
It can be observed in
Table 4 that, in the “Vegetation” class, there was a slight improvement over time, with low commission and omission errors reflecting a high accuracy in detecting this class. In contrast, the omission and commission errors for “Bare soil” increased from 2018 to 2023, indicating a slight decline in detection accuracy and a moderate tendency toward overclassification. However, as with “Vegetation”, the errors remained low. These results are consistent with previous studies [
33], which suggest that soil and vegetation classes in the Caatinga tend to vary little compared to reference products. This behavior is associated with the consolidated use of these areas and the low anthropogenic interference in the Caatinga biome landscape [
11].
The “Urban area” class showed the highest error values, though with an improvement in omission error, suggesting greater effectiveness in detecting urban area over time (
Table 4). On the other hand, the commission error increased considerably, indicating a greater tendency to misclassify other classes as urban area in 2023. The spectral similarity between urban soil and bare soil classes, which share spectral characteristics, may have contributed to the significant omission and commission errors in this category, as illustrated in
Table 4 and discussed by [
10].
For the “Water” class, the omission error remained at 0% from 2018 to 2023, indicating high accuracy in identifying this class in both years. However, the commission error increased slightly from 0% to 0.07%, suggesting a slight tendency to overestimate the area of water bodies in 2023. These results also corroborate previous studies [
22], which highlight the high detection quality of water bodies, attributed to their high spectral absorption characteristics compared to general soil and vegetation classes.
The water bodies identified in the study area, primarily small lakes and intermittent ponds, reflect the typical seasonal dynamics of the semi-arid region of Pernambuco, influenced by variations in precipitation and evapotranspiration patterns. This characteristic directly impacts local gold mining practices, which are not exclusively reliant on perennial water bodies such as streams. Instead, mining operations frequently occur in dry areas or near intermittent water bodies, often utilizing artificial systems for ore washing. This reality was incorporated into the revised maps, which now more accurately highlight the spatial distribution of these water resources in the context of mining activities, providing a more robust foundation for environmental impact analysis.
The high OE in the Constructions class reveals that the model failed to correctly identify many true positive cases, while the high CE indicates that the model mistakenly classified some cases as belonging to the class when they did not. These results highlight the need for improvements or adjustments in the model to increase both sensitivity and specificity, reducing the rates of false negatives and false positives (
Figure 8).
Figure 8 shows areas that, although devoid of constructions, were erroneously classified as such. These false positives generated in the classification can be attributed to the characteristics of the materials used in the construction roofs, which are predominantly clay. In
Figure 9, a visualization of the coverage of a constructed area in the study region is presented, comparing PlanetScope images and the Google Earth platform.
Due to the satellite’s spatial resolution, the lot boundaries are not well-defined, and non-ceramic roofs are merged with areas of bare soil. Additionally, the area features frequent arborization between lots, which can also result in spectral mixing for the construction target.
4. Discussion
The experimental results revealed significant changes in land use and land cover between 2018 and 2023, with a 3.28% decrease in vegetation cover and a 6.62% increase in urbanized areas. These figures highlight the accelerated impact of mining activities, especially in the direct influence area along water bodies, where an increase in riparian vegetation loss was observed. The effectiveness of the kNN algorithm, demonstrated by an Overall Accuracy above 99% and a Kappa index of 0.98, reinforces its applicability in mining-impacted scenarios. However, the identified challenges, such as spectral overlap between urban area and exposed soil, reflect the need for complementary methods, such as textural variables or images with higher spatial resolution.
The reduction in the “water” class may be related to the decrease in precipitation recorded in October 2023, a month with little or no rain in the region. However, data variations may also be attributed to possible classification errors between bare soil and constructions, caused by the spectral similarity of the objects due to the sensor’s spatial resolution. According to Novo et al. [
74], different types of land cover have distinct spectral signatures, but the similarity of these signatures under certain conditions can result in classification errors. This phenomenon is particularly challenging in RS, demanding refined techniques to ensure greater accuracy.
The separability analysis between land use and land cover classes, using JMD associated with the NDVI and NDWI spectral indices, demonstrated high effectiveness, especially in distinguishing between Water and Vegetation classes. The results corroborate studies such as those by Shikhov et al. [
7], who also observed a strong correlation between mining activities and their influence on the environmental degradation process. However, while other studies often report difficulties in detecting water in mined areas, the use of the NDWI in this work ensured clear separability for the water class, as evidenced by the high Jeffries–Matusita distance values (>1.95). This result is consistent with the findings of Foody [
75], who emphasized the importance of vegetation indices like NDVI in improving the separation of classes with distinct spectral characteristics, such as dense vegetation and water bodies.
Another relevant point is the high separability observed for the Bare soil vs. Vegetation class, particularly with the Red and NIR bands. Xie et al. [
76] point out that the use of these bands, associated with vegetation indices, significantly improves the discrimination of bare soil due to the high reflectance in the red and near-infrared bands. These authors also suggest that the combination of spectral indices and specific bands can improve the accuracy of classification in mined and deforested areas, as observed in their study.
However, the low separability between Constructions and Vegetation, especially for NDWI, reflects a frequent challenge in using spectral indices in urban areas. Yang et al. [
77] identify similar limitations when using spectral indices to separate urban areas from vegetation, pointing to the need for post-processing techniques, such as the integration of textural variables, to overcome spectral mixing problems. Additionally, Pal and Foody [
78] also discuss how the spectral similarity between construction materials and bare soil can complicate classification, requiring more refined adjustments to the classification algorithm.
The use of multiple spectral bands in combination with NDVI and NDWI indices, as evaluated through JMD, proved to be an effective strategy for improving class separability. Camps-Valls et al. [
79] highlight that the use of machine learning techniques, such as kNN, in combination with spectral bands and derived indices, can maximize classification accuracy, especially in areas where the distinction between classes is difficult due to complex spectral signatures.
The accuracy of kNN in the years 2018 and 2023, with values above 99% and a Kappa index over 0.98, indicates excellent performance of the machine learning model in land use and cover classification. These results are consistent with the study by Zaki et al. [
39], who also achieved high precision using machine learning algorithms to predict mineralization in mined areas. The robustness of the Kappa index in both studies demonstrates that, even in complex scenarios such as mining environments, kNN can provide consistent and reliable classifications.
The precision of kNN in this study highlights the efficiency of combining NDVI and NDWI spectral indices with the machine learning algorithm, something also supported by Fonseca et al. [
10]. They pointed out that the integration of temporal spectral indices, such as NDVI, substantially improves the detection of changes in artisanal and small-scale mining areas. The high global accuracy values observed for 2018 and 2023 suggest that kNN can be a reliable alternative for monitoring land use changes, especially in regions with mining activities.
However, OE and CE identified, particularly in the Constructions class, indicate room for improvement, especially in differentiating between urban areas and bare soil, as also reported by Shikhov et al. [
7]. The errors observed in this study may be related to the spatial resolution of the PlanetScope sensor and the spectral similarity between construction materials and bare soil, as discussed by Isidro et al. [
40]. Improvements in spectral segmentation techniques or the use of sensors with higher spatial resolution could potentially reduce these errors.
These results reinforce the relevance of using robust machine learning methodologies, such as kNN, for monitoring land use and cover in mining areas, but also point to the need for model adjustments to improve its sensitivity in certain classes, such as built-up areas and bare soil. The difficulties encountered in distinguishing between urban soil and bare soil classes are largely due to their spectral similarity and the sensor’s spatial resolution. To overcome these limitations, future studies could explore integrating textural variables derived from high-resolution imagery or employing hybrid classifiers that combine machine learning with texture analysis. This approach could improve the accuracy of class separation in complex urban environments.
The selection of land use and land cover classes was a critical step in the methodology. The delineation of categories was based on predominant features that were most relevant to the study’s objectives, considering the spectral and spatial limitations of PlanetScope images. The integration of categories such as croplands was considered but proved unfeasible in the study area due to the low expressiveness of this class and the difficulty in distinguishing cultivated lands from native vegetation or bare soil. Future research could explore the use of sensors with higher spectral resolution or complementary techniques to enhance the detail of the classes.
Evaluating surface mining areas through high spatial resolution satellite images is an efficient tool for monitoring and assessing land cover and use changes in mining complexes. It is important to highlight that high spatial resolution satellite images from the PlanetScope constellation have been freely available since 2017, making them an important data source for land use and cover monitoring in general.
The analysis based on PlanetScope images, combined with visual validation through Google Earth, has proven to be a powerful tool for monitoring land use and land cover changes in areas affected by mining. PlanetScope’s ability to provide daily high-resolution images, coupled with the use of machine learning algorithms such as kNN, allows for the rapid identification of impacted areas and the prioritization of mitigation actions. This approach can also be integrated into public management systems, enabling regulatory bodies such as the National Mining Agency (ANM) to use updated data to monitor mining activities in near real time. This integration provides a solid foundation for monitoring illegal activities, planning environmental recovery strategies, and promoting more sustainable use of mineral resources.
This technical association allows for spatial and temporal validation of the obtained data, especially in areas where the 3 m spatial resolution of PlanetScope may generate uncertainties due to spectral similarities between classes, such as urban area and exposed soil. Additionally, the use of historical images from Google Earth enables a retrospective analysis of environmental transformations, enriching the understanding of spatial dynamics and providing a visual history to support strategic decisions in environmental and mining management. This integrated approach demonstrates considerable potential for future applications in continuous monitoring and environmental oversight, contributing to greater accuracy in identifying environmental impacts in mining-affected areas.
Other methodological advances should focus on recognizing and distinguishing different stages of rehabilitation in mining areas (e.g., herbaceous, shrub, and forest cover) from high-resolution satellite systems and unmanned aerial vehicles to remotely track the environmental progress of revegetation areas [
33].
According to Tang et al. [
80], environmental changes caused by human factors, such as industrialization, urbanization, economy, and technology, surpass even those caused by natural factors in intensity and have a decisive impact on short-term land cover changes in mining areas, where social and economic factors are more important. As a result, the traditional farming mode may be affected, leading to the destruction of the ecological environment.
The expansion of mining activities and the associated vegetation degradation in the Serrita-Cedro Project region have caused significant environmental and socioeconomic impacts. From an environmental perspective, extensive vegetation removal has led to a loss of biodiversity and a decline in ecosystem services. This degradation has disrupted the local hydrological cycle, increasing erosion risks and reducing soil infiltration. Furthermore, waste generated by mining activities has polluted soil and water resources, compromising the quality of essential water supplies for local communities. These environmental challenges have heightened the region’s ecological vulnerabilities, threatening long-term sustainability [
81].
From a socioeconomic perspective, mining activities in the Serrita-Cedro Project and other areas in Brazil have provided immediate economic opportunities, such as job creation and increased local income. However, economic dependence on mining has left communities vulnerable, particularly as mines near resource depletion. Additionally, the displacement of local populations and territorial conflicts have impacted traditional communities, exacerbating social inequalities. Atmospheric pollution from mining activities has further contributed to a rise in respiratory diseases, highlighting the need for more robust strategies to mitigate these damages [
82].
RS monitoring and evaluation of the effects of mining on long-term changes can provide a solid understanding to guide mine ecological restoration and local ecosystem sustainability [
83], despite some limitations of this technology. However, in the future, with the development of new sensors and satellites with better resolutions, integrated with new methodological processes, RS monitoring of mining areas will become more efficient.
5. Conclusions
The spatiotemporal analysis of mining areas in the semi-arid region of Pernambuco, utilizing high-resolution images from 2018 and 2023 and machine learning techniques, highlighted the magnitude of the environmental transformations occurring in the region. The data revealed a reduction in vegetation cover and a significant increase in urban areas and bare soil, which are direct reflections of the expansion of mining activities. These results underscore the continuous pressure that these activities exert on local ecosystems, especially in sensitive regions like the Caatinga, where biodiversity is already naturally adapted to extreme climate and soil conditions.
The applied methodology, which combined the kNN algorithm and the NDVI and NDWI spectral indices, demonstrated accuracy in image classification and landscape change identification. With an accuracy exceeding 99% and a Kappa index above 0.98, the methodology was effective in detecting impacted areas, confirming the potential of these tools in environmental monitoring in mining areas. However, some challenges were observed, such as the separability between the classes of urban area and bare soil, suggesting that future adjustments in modeling may further increase the precision of the results.
The findings indicate that between 2018 and 2023, there was a marked degradation of vegetation and a significant increase in built areas, especially near water bodies. This trend reflects the intense human intervention in the region and reinforces the need for public policies aimed at mitigating these impacts, as well as promoting environmental recovery in affected areas.
The uncontrolled expansion of mining poses a threat to environmental sustainability, endangering local communities that rely on natural resources for their livelihoods. The results obtained in this study demonstrated that mining activities significantly influenced changes in land use and cover in the analyzed region. However, this study reaffirms the importance of using RS and machine learning technologies in environmental monitoring, especially in vulnerable areas like the Brazilian semi-arid region. Furthermore, it highlights the need for regulation and responsible management of mining activities, to adopt more sustainable practices that balance economic development and environmental preservation.
This work provided an in-depth understanding of spatiotemporal changes in land cover, emphasizing the importance of RS and spatial data analysis in environmental monitoring. The classification system adopted in this study was suitable for representing the main land use and land cover transformations in the mining area under investigation. However, we acknowledge that the inclusion of additional categories, such as croplands, could enrich the analysis in regions where such features are more prominent, especially using higher-resolution images and refined methodologies.
For future research, it is suggested to expand the training dataset and explore other machine learning techniques to enhance classification. Additionally, it is recommended to conduct further studies to investigate the impact of land use policies and climate change on vegetation dynamics in mining areas, aiming to contribute to conservation strategies and sustainable development.
It is also noted that the methodology tested in this study could be implemented to assess the spatiotemporal behavior of land cover in other mining regions with arid and/or semi-arid climatic characteristics.