1. Introduction
One of the peculiar features of climate science is the accumulation of an enormous amount of data [
1,
2]. The estimated size of climate data exceeds ten petabytes and continues to grow exponentially [
3]. Furthermore, the number of different and diverse data sources is also increasing. Initially, information is collected by thousands of ground-based weather instruments all over the world, such as weather stations, as well as by a large number of satellites that perform measurements from kilometers above the ground. These data need to be processed and transformed to formats that are comparable with each other. Climate science multimedia systems exists, but are still underinvestigated compared to other areas such as social media or medicine [
4,
5].
Motivated by the big data challenges and the need for multimedia systems within climate science, we will in this paper address the important, yet less studied, problem of analyzing the climate effects of land cover (LC) changes, such as deforestation or urbanization. We will go through all the steps of the multimedia pipeline to handle and gain knowledge from the large amounts of climate data to address this problem.
Nowadays, climate changes and global warming are indisputable facts [
6,
7,
8,
9,
10]. Global surface temperature has been methodically collected since 1850. According to these records, the last 30 years exceed any previous decade in temperature. Furthermore, in some regions, the temperature has been reconstructed over the last 1400 years, and the period between 1983 and 2012 was the warmest 30-year period during this time [
6,
7,
10,
11].
Climate changes and temperature growth have a huge impact on natural and anthropogenic systems on all continents and oceans: melting of snow and ice, sea level rise, decrease in fresh water volume and quality, changes in precipitation patterns, behavior alterations of marine organisms and animals, negative effects on agriculture, and many other effects [
9].
Anthropogenic factors, such as CO
emission, are considered as the main cause of global warming in the second half of the 20th century [
9]. LC changes are also known to influence the regional climate because they alter biophysical mechanisms, such as evapotranspiration, albedo, and surface roughness [
12,
13,
14]. LC type transformation has various causes. On the one hand, it can be caused by natural factors such as floods, sea level rise, or wildfires. Climate change is also causing LC changes, such as favoring tree expansion in mountain areas, or greening, or forest degradation [
15,
16,
17]. Anthropogenic factors, such as deforestation or expansion of agricultural land, also have a significant and often dominant impact on an LC transition. LC plays a significant role in energy and water exchange between atmosphere and the Earth’s surface. The terrestrial areas not only produce the greenhouse gases (such as CO
), but also absorb them [
7]. Therefore, sustainable land management is an important tool for climate change mitigation. The Intergovernmental Panel on Climate Change (IPCC) [
7] states in a recent report that the development of appropriate policies can considerably contribute to the climate change adaptation and affect the rate of temperature rise. Some of mechanisms that have already been implemented, confirm the efficiency of this approach [
7]. Good examples of these measures are sustainable food production and forest management, food waste reduction, and avoidance and prevention of deforestation and land degradation. Even more political actions can be adopted. Nevertheless, in order to develop efficient policies, it is important to understand how different changes in LC affect local and global climate [
18]. Researchers pay special attention to the importance of long-term monitoring of various types of LC transformations and their relation to climate changes [
7,
9,
14,
19].
Nowadays, simulations based on numerical climate models are the largest source of climate information [
3]. They allow researchers to model a climate response to some specific changes in a climate system. To perform an experiment, one should run a climate model with different input variables a few times, and then compare the results to understand the impact of these input parameters. For instance, a climate model can show what kind of changes occurs in a climate system if the input data differs only in LC. Nevertheless, a result obtained from the climate model can have complex non-linear patterns that are difficult to identify, but can help to get new insight and spare costly simulations. For example, machine learning (ML) is of special interest to researchers as a powerful tool for such kind of tasks, as well as for other problems within climate science. However, while ML is widely used in different scientific fields, it still has a limited application within climate science [
20].
There are many ways to study the impact of LC transformations on temperature. Some works on LC changes and their impact on climate are based on directly observed data [
21,
22], while others use mainly simulations and climate models [
12,
23,
24]. In general, scientists agree that the climate system is very complex and depends on many factors. The impact of LC change can vary on global and regional scale. Moreover, the same transformations can lead to different consequences depending on the region where it happened. Obviously, different LC changes have a unique effect on climate [
25]. Most publications are focused on some individual LC changes, for example, deforestation [
24,
26] or urbanization effects [
27,
28]. However, this question is rarely studied in a broad perspective, taking into account all types of LC transitions simultaneously. Recently, Huang et al. analyzed the regional impact of cumulative LC changes on European climate [
14]. The key point was to take into account all types of LC simultaneously and further to distinguish the individual impact of different LC changes in regional climate.
In this paper, we follow the same approach as Huang et al. but given the aforementioned complexities in how LC changes affect temperature, we will explore the potential of using well working ML methods, such as support vector regression (SVR), random forest (RF), multiple linear regression (MLR), and least absolute shrinkage and selection operator regression (LASSO) to learn these complex relations [
29]. The method that learns the relations best, will further be used to study the effects of LC changes on temperature, using a new suggested framework based on explainable artificial intelligence methods (XAI) [
30].
3. Machine Learning and Explainable Artificial Intelligence
Let represent N observations of p features, and let represent some associated response. The aim of ML is to learn a function that is able to predict the response from these features. In this paper we consider four well-known models, namely MLR, LASSO, SVM, and RF.
MLR assumes a linear association between the features and the response
where
represent zero mean Gaussian distributed error terms. The parameter estimates are usually found by minimizing the least squares error
Given many features, a potential challenge with linear regression is that the model not only fits the signal in the data, but also the noise, usually resulting in poor prediction performance on held-out data. The problem is referred to as over-fitting. Regularization is a popular technique to address this issue. For example the LASSO model adds the sum of the absolute value of the parameter estimates as a penalty term to the optimization [
37]
A positive property of the LASSO, is that the resulting model often will be sparse in the sense that most of the parameter estimates are set to zero, making model interpretation easier. A higher value of the regularization parameter results in a more sparse solution, and less chance of over-fitting. In this paper, we adjusted the value of to optimize prediction performance on held-out data.
We also consider two other very popular ML methods, namely the SVM and the RF. When the response is continuous (as it is in this work), SVM is often referred to as support vector regression (SVR). The idea behind SVR is to find the regression plane such that as many of the observations are within a (support) region around the regression plane as possible. The width of the support region is also part of the optimizing procedure.
The RF model consists of an ensemble of decision trees and, thus, is called random forest. A decision tree is a flowchart-like structure in which each internal node represents a decision based on a single feature or linear combination of a subset of features. The classification or prediction decision is based on a series of such individual decisions. RF is based on using different bootstrapping techniques to train multiple decision trees. RF makes decisions based on all the trees, for example through the average output from the trees or the majority output. In this paper, we based the decisions on the average outputs.
High dimensional data or a complex model, can make model interpretation difficult. Regression models can to some extent be interpreted by studying the size of the regression parameters
, and represent the core of statistical inference. However, other models, such as the RF, are far more difficult to interpret. Recently the field of XAI has received a lot of interest trying to provide explanations for such opaque models. The core idea of XAI techniques is quite simple, and based on analyzing how changes in the input features affect the model output, but more sophisticated methods have also been developed [
30]. In this paper, we will resort to a quite simple XAI approach based on analyzing how changes in a single feature will affect the output. The analysis will be explained in further detail in
Section 4.2.
6. Discussion
Despite the substantial amount of uncertainty in the predictions, the results reveal several statistically significant temperature changes. They also show that the most frequent LC changes result in mainly warming in northern and central Europe and primarily cooling in the southern Europe. The most frequent LC changes are largely different for the different parts of Europe, which makes sense since the different parts of Europe mainly consist of different types of vegetation. However, for the LC changes that are frequent in more than one part of Europe, we observe a consistency in temperature change. For example cropland to urban built-up result in significant warming in all three parts of Europe and for the whole of Europe. There is also a consistency between seasons of the year in the sense that a LC change either results in warming or cooling for every season, and interestingly this observation was not detected by Huang et al. [
14] with the regression based approach (there are no rows with both red and blue cells). For example, for the whole of Europe, deciduous broadleaf forest to cropland results in statistically significant cooling for both summer and autumn and no statistically significant warming (or cooling) for the other seasons.
To further verify the validity of our suggested approach, we now analyze how consistent our results are with other studies based on statistical approaches and climate model simulations. Many studies revealed a strong correlation between temperature increase and growth in shrub species [
6,
40,
41,
42,
43]. Some of these researchers discussed the positive feedback loop when LC transitions affect climate, while temperature changes also influence LC transformation [
40,
43,
44]. Firstly, a warming increases a spreading of shrublands. Then, LC transition to shrublands influences the energy exchange, increasing the absorption of solar radiation due to lower surface albedo. This, in turn, results in a temperature rise. However, it can be complicated to distinguish what is the main driver in this feedback loop. In this paper, we analyze only the impact of LC on temperature change, ignoring the effect of a warming on LC. We observed that transition to open shrublands alone leads to a temperature increase in northern and southern Europe.
Some works demonstrate that shrubland increase in Arctic can lead to an annual temperature increase [
41,
42,
45], which is consistent with our own findings. However, most articles only consider the growth of shrubs and do not pay attention to the initial cover. Therefore, our approach can help in understanding how prominent is the effect of LC transformation to shrubs depending on the initial LC. For instance, the replacement of barren or sparsely vegetated cover to shrublands causes a more significant warming than a temperature rise associated with transition from permanent wetland to open shrublands.
Urbanization and its impact on temperature is another subject which draws the interest of climate scientists. In general, researchers conclude that the transition to urban and built-up covers causes a warming [
7,
14,
46,
47]. Indeed, we also observed that most of the LC changes to urban and built-up covers results in a temperature growth during the whole year, as well as seasonally.
Deforestation and its contribution to a temperature increase, is an important research subject that has been explored by many authors [
14,
48,
49]. In this paper, we also observed a similar trend. Most LC changes associated with deforestation observed in our work lead to a significant temperature increase.
Afforestation is considered as a possible solution to the problem of the warming effect of deforestation because of its contribution to cooling [
7,
48,
49]. In this paper, we detected such a trend in southern Europe where the shift from cropland or natural vegetation mosaic to Evergreen Needleleaf or deciduous broadleaf forest results in a significant cooling. However, in central Europe, we could not identify a clear pattern in temperature change associated with afforestation. Moreover, the transition from permanent wetland to any kind of forest contributes to a warming in northern Europe. This is consistent with the results of Li et al. where a transition of any LC to forest leads to a cooling in tropical regions but to warming in high latitudes [
49].
Summarizing, we can conclude that our predictions of the LC-change-impact on temperature are consistent with the main trends described by the IPCC [
6,
7] and other studies. Our analyses also revealed new insights which supports the assumption that the ML techniques can be a useful tool in climate science, and it is possible to develop a model that can make a meaningful prediction. In addition, our approach allows us to extract more complex patterns and gain a more clear understanding of the effect of different LC transitions. This demonstrates that the ML techniques can help to figure out the effect of LC changes on surface temperature which opens up for a myriad of future work to explore and exploit this further.