1. Introduction
Knowledge of temporal and spatial variability of canopy biophysical characteristics is vital for understanding the interaction between atmosphere, solar radiation, and vegetation [
1]. Among the many vegetation characteristics, leaf area index (LAI), defined as half the total green leaf area per unit ground surface area [
2], is of prime importance. It is a critical canopy structural parameter that controls the energy, water, and gaseous exchanges in plant ecosystems [
3] and an important indicator of vegetation productivity, health, and degree of stress [
4,
5].
Direct and indirect methods are often used for determining LAI [
6,
7,
8]. These methods are subjective, time-consuming, and may be destructive. Therefore, it is difficult to achieve adequate information on the spatiotemporal distribution of LAI using ground measurement techniques. Remote sensing techniques provide an alternative for monitoring the spatial and temporal variations of LAI using various types of sensors at different scales. The most commonly used method is to establish the empirical relationship between the biophysical variables of interest and the spectral characteristics or their transformed values [
9,
10,
11]. Statistical models established on the basis of these empirical relationships may display good predictive power at specific sites and sampling conditions but are not usually applicable to different experimental conditions [
12]. In contrast, physically based approaches can obtain more robust results. Radiative transfer models (RTMs) generally provide a simplified description of the photon-vegetation interaction based on some parameters such as canopy structure, soil background, and the view and illumination geometry of the sensing device.
Different strategies have been proposed for the inversion of crop biophysical parameters using RTMs, and they include the iterative optimization method [
13,
14], look-up table approach [
10,
15,
16], and artificial neural networks [
17,
18,
19]. All these methods have their strengths and weaknesses under specific circumstances. Iterative optimization is a direct retrieval approach of biophysical parameters from observed reflectance but is however, computationally intensive. It is also sensitive to the initial values of model parameters and may get trapped in local minima before reaching the global minimum. Artificial neural networks are often criticized as being black boxes [
20]. Moreover, neural networks always need the same number of inputs. A network designed to retrieve biophysical variables from a four-band imagery will not be applicable on imagery with only three-bands [
21]. In contrast to numerical optimization and artificial neural networks, the look up table approach performs a global search and thus, avoids the danger of being trapped in local minima [
22], but the definition of the cost function to be minimized still remains an open question when the uncertainties and their structure is not very well known, which is generally the case.
To overcome the limitations of empirical and radiative transfer models, a hybrid method has been proposed in this study. This hybrid method utilizes RTMs to generate a simulation dataset which includes spectral data and the corresponding biophysical and biochemical parameters. A regression model is then constructed based on the simulation dataset using parametric (curve fitting) or non-parametric (machine learning) methods. The choice of appropriate RTMs is extremely important for the successful retrieval of vegetation state variables [
23,
24]. In this study, we employed the PROSAIL model: a coupling of the canopy bidirectional reflectance model SAIL (Scattering by Arbitrary Inclined Leaves) [
25] and the PROSPECT leaf optical properties model [
26] to simulate canopy spectra. As oilseed rape is a rather homogeneous target, it can be adequately approximated by a 1D model formulation. Some studies have applied this model in the retrieval of oilseed rape biophysical parameters [
16,
17].
During the last two decades, many studies have demonstrated the potential of optical sensors for estimating LAI spatially and temporally using different resolution satellites. However, few studies have been published on the estimation of oilseed rape LAI using high spatial resolution remote sensing data such as Pleiades, WorldView-2/3 (WV-2/3), and SPOT-6 data. In view of their high spatial resolution (pixel sizes < 10 m), these satellite images could discern even intra-field crop variability, information which would guide the application of fertilizers and other agrochemical inputs [
27], irrigation scheduling, and other management practices that ensure precision agriculture. Therefore, it is necessary to evaluate the potential of these high resolution optical images for estimating the LAI of important agricultural crops such as winter oilseed rape.
In the current study, nine vegetation indices (VIs) have been used as predictors to retrieve oilseed rape LAI. Curve fitting, k-near neighbor (kNN) and random forest regression (RFR) algorithms are used to establish the relationships between optimal independent variables and LAI values based on the PROSAIL simulation datasets. Specifically, the objectives of this study were to: (1) identify suitable independent variables for oilseed rape LAI estimation; (2) determine the optimal algorithm by comparing different regression models established using curve fitting and machine learning methods; (3) map the spatio-temporal variability of winter oilseed rape LAI at different growth stages.
4. Discussion
In this study, LAI values of oilseed rape were estimated using a hybrid inversion method based on a radiative transfer model. This method differs from empirical approaches that require several ground measured datasets for modeling. The hybrid inversion method employed in this study utilizes only a few ground measured biophysical data points to validate the accuracy of the model established with the simulation dataset. In contrast to other inversion strategies based on radiative transfer models, such as look up table, the hybrid inversion method has a greater potential for estimating crop biophysical parameters as the model construction is fast and simple, and the computation time is short with no need to consider the selection of optimal solutions.
The ranking of the three VIs (NDVI, MSR, and ARVI) selected by the curve fitting method at different growth stages was not consistent. The possible reason for this inconsistency is that the performance of VIs can be affected by sun and sensor geometry [
52]. Many studies have assessed the effects of viewing and illumination geometry on VIs [
53,
54,
55]. The impacts further hamper the interpretation of temporal variations in land-surface vegetation. In addition, the difference among the spectral response function of the four sensors used in this paper was another reason for this inconsistency. The quadratic equations were the optimal models for the best three VIs. Similar results were reported by Chen
, et al. [
56] who found the non-linear relationship between LAI and SR to be appropriate for agricultural crops in Landsat-5 Thematic Mapper (TM) scenes.
While VIs derived from satellite remote sensing have been used to map LAI, a drawback is that the general applicability of these regression approaches is reduced because the VIs are affected by many factors including atmospheric effects, canopy geometry, vegetation developmental stage, geometry of observation, understory vegetation and soil conditions, and type of sensors [
9,
49,
57]. An ideal VI should be sensitive to the biophysical parameters of interest but insensitive to confounding factors. Thus, the sensitivity analysis of VIs is very important to obtain an accurate LAI estimate. In this paper, the optimal VIs were selected using curve fitting and random forest methods, and these VIs showed good accuracy for the estimation of LAI. Although the sensitivity analysis of VIs for oilseed rape LAI estimation was not performed, the methodology used in this paper reasonably estimated oilseed rape LAI and is therefore recommended for application at larger mapping scales.
To increase the inversion accuracy of biophysical parameters, several non-parametric regression methods, such as the kNN algorithm, have been used to build the relationship between the simulated spectral variables (VIs) and the crop biophysical parameters of interest [
17,
21]. These algorithms differ from ordinary least square regressions which need assumptions on the distribution of response variables. In this paper, RFR is regarded as the optimal method to establish the relationship between simulated VIs and LAI values, with a higher R
2, and lower RMSE and RRMSE values based on the simulated independent validation dataset (
Table 7). This finding is consistent with recent studies which indicated the RFR algorithm to be a better alternative to other regression algorithms [
52,
53]. Random forest is not only used to predict biophysical variables of crops, but also provides an effective methodology for variable selection. The variable importance measure provided by random forest can be used to identify variables that are of relevance and explain the relationship between predictor and response variables. However, after the initial selection of optimal predictor variables, it would still be unclear as to which variable combination is best for the prediction of parameters such as oilseed rape LAI. To find the best subsets of variables, the backward elimination was used and integrated with the variable importance measure. This method has the advantage of shorter computational time while being robust against the problem of data over-fitting. Despite the overwhelming advantages of RFR models as detailed in [
37], they also have limitations, especially with respect to the manner in which regression trees are constructed. In this study, random forest is prone to underestimation at high LAI values. This phenomenon has also been observed in [
41] where the random forest algorithm, applied to wetland vegetation, exhibited gross underestimation at higher levels of plant biomass.
Four images with high spatial resolution were used to map the temporal and spatial distribution of oilseed rape LAI values in this study. The images with a spatial resolution of 2 m achieved more elaborate information on LAI distribution than images with a spatial resolution of 6 m (
Figure 5). The validation result obtained using the optimal modeling algorithms and the optimal VIs displayed a relatively poor prediction accuracy. The image on 4 December 2014 showed an overestimation of LAI values and the validation results from the ground measured dataset also indicated that LAI values from observations on 8 December 2014 and 31 December 2014 were overestimated. The reason for this phenomenon is attributable to the growth of grass in the field which contributes to the pixel values of images. Although herbicides have been applied at the start of the seedling stage, there were still several grasses in the field as observed during in situ measurements. As a result, the retrieved LAI included oilseed rape and grass. Although the overall validation accuracy seems relatively poor in the current study, observations obtained on 5 February 2015 and 12 March 2015 were more close to the one to one line, which indicates the potential of high spatial resolution satellite data for monitoring intra-field variations of crop biophysical parameters, especially from the middle to the latter growth stages. This result is in line with previous studies which have demonstrated that high spatial resolution optical satellite imagery are an efficient data source for the estimation of crop biophysical and biochemical parameters [
58,
59,
60] and their accurate quantification would help to improve the management of crops and agricultural lands.