Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Ensemble Machine Learning on Fusion of Sentinel Time Series Imagery with High-Resolution Orthoimagery for Improved Land Use Land Cover Mapping

Version 1 : Received: 9 June 2024 / Approved: 10 June 2024 / Online: 11 June 2024 (08:40:14 CEST)

A peer-reviewed article of this Preprint also exists.

Subedi, M.R.; Portillo-Quintero, C.; McIntyre, N.E.; Kahl, S.S.; Cox, R.D.; Perry, G.; Song, X. Ensemble Machine Learning on the Fusion of Sentinel Time Series Imagery with High-Resolution Orthoimagery for Improved Land Use/Land Cover Mapping. Remote Sens. 2024, 16, 2778. Subedi, M.R.; Portillo-Quintero, C.; McIntyre, N.E.; Kahl, S.S.; Cox, R.D.; Perry, G.; Song, X. Ensemble Machine Learning on the Fusion of Sentinel Time Series Imagery with High-Resolution Orthoimagery for Improved Land Use/Land Cover Mapping. Remote Sens. 2024, 16, 2778.

Abstract

In the United States, several land use and land cover (LULC) data sets are available based on Landsat. These data sets often fail to accurately represent the features on the ground despite having several advantages over other data. Detailed mapping of complex heterogeneous landscapes for informed decision-making is possible using high spatial resolution orthoimagery from the National Agricultural Imagery Program (NAIP). However, large-area mapping at this resolution remains challenging due to radiometric differences between scenes, low spectral depth of imagery, landscape heterogeneity, and computational limitations. Machine learning (ML) techniques have shown promise in improving LULC maps. The primary purposes of this study are to examine and evaluate bagging (Random Forest; RF), Boosting (Gradient Boosting Machines [GBM] and Extreme Gradient Boosting [XGB]), and stacking ensemble models on sentinel 2A fusion data on NAIP imagery. We also compared the accuracy based on random cross-validation (80% / 20% split) without accounting for spatial autocorrelation and target-oriented validation (10 clusters) accounting for spatial structures of the training data set. To make the LULC map of a portion of Tom Green and Irion counties in central Texas, we used a time series of sentinel data and NAIP orthoimagery. We created several spectral indices, structural variables, and geometry-based variables, reducing the dimensionality of features generated on Sentinel and NAIP data. The random and target-oriented validation results show that autocorrelation in the training data offers over-optimistic results. In our case, the overestimation ranged from 2 % to 3.5 %. The XGB-boosted stacking ensemble on-base learners (RF, XGB, and GBM) improved model performance over individual base-learner. The main contribution of this research is that meta-learners are just as sensitive to overfitting as base models, as these algorithms are not designed to account for spatial information. Finally, we show that the fusion of Sentinel 2A data with NAIP data improves land use land cover classification using Geographic Object-Based Image Analysis (GEOBIA).

Keywords

Bagging; Boosting; Stacking; GEOBIA; Autocorrelation; Target-oriented Validation; Data fusion

Subject

Environmental and Earth Sciences, Remote Sensing

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.