A Comparative Study of Sampling Analysis in the Scene Classification of Optical High-Spatial Resolution Remote Sensing Imagery
Abstract
:1. Introduction
1.1. Problem and Motivation
- -
- Sampling is an essential step for the scene classification of optical HSR-RS imagery. In image analysis, sampling is concerned with the selection of a subset of pixels from an image to estimate characteristics of the entire image. It is essential primarily because the volume of image data is often too large to be processed, and a small yet representative set of data is demanded. For instance, without sampling, it is intractable or even infeasible to train a BoVW model or unsupervised feature learning-based model for scene classification [4,9,10,13,14] with a computer of advanced configurations, as the volume of optical HSR-RS imagery generally amounts to several gigabytes.
- -
- Sampling strategies are crucial to the performance of the scene classification of optical HSR-RS imagery. The main goal of sampling is to select a representative subset of the entire data. The subsequent procedures directly process the sampled data while leaving out the information of all of the unselected ones; thus, how to sample the image data containing the most descriptive and discriminative information for classification has a considerable influence on the final results. Good samples can guarantee the classification performance and simultaneously achieve substantial improvements in the time and space complexities, whereas non-representative samples will considerably reduce the performance. Thus, in scenarios of scene classification of optical HSR-RS imagery, it is crucial to pursue a sampling strategy that balances the classification accuracy and the space and time complexity. However, it is not yet clear how to choose suitable sampling strategies.
- -
- There is a lack of studies on sampling strategies for the scene classification of optical HSR-RS imagery. Note that different sampling strategies have been used by recent scene classification approaches, e.g., grid sampling by [1,10,14] and a saliency-based one in [13]. To our knowledge, however, there are few comparative studies between these strategies.
- -
- Sampling strategies in natural scenes cannot be copied to the scene classification of optical HSR-RS imagery. Note that sampling strategies have been better understood in natural images, e.g., some comparisons on sampling strategies have been made in natural image classification [18,19]. However, it cannot be copied to the scenarios of optical HSR-RS imagery, because there are large differences between them, e.g., in the shooting angles and occlusions. For instance, natural images are primarily captured by cameras in the front with manual focus and even auto-focus capabilities, and it thus makes the natural scenes tend to be upright and center biased, which is obviously not the case of remote sensing images that are often taken from overhead. Consequently, this makes the appearances of the scene quite different. Moreover, the definitions of the scene are different between natural images and optical HSR-RS images. For natural images, the scene is often classified into indoor and outdoor types [18,20,21]. However, for optical HSR-RS images, the indoor types do not exist, and the outdoor types are further classified into forest, desert, commercial area, residential area, and so on.
1.2. Objective and Contributions
- -
- Our work is the first to study and compare the effects of many different sampling strategies on the scene classification of optical HSR-RS imagery, which is in contrast with previous works that only choose a certain sampling strategy, but concentrate on developing discriminative feature descriptions or classifiers. Our results can be used to improve the performance of previous works while being very instructive for future works on the scene classification of optical HSR-RS imagery.
- -
- We have intensively studied the performances of various saliency-based sampling strategies that are recently proposed and highlighted to be useful for classification on different types of scenes. Our experiments show that saliency-based sampling is mainly helpful for object-centered scenes, but does not work for scenes with complex textures and geometries.
2. Related Work
3. Method for the Comparative Study
3.1. Scheme of the Comparative Study
- -
- Sampling: selecting a compact, but representative subset of images. Given an image, a set of local image patches is selected via different sampling methods (e.g., random or saliency-based sampling strategies) to form a representative and informative subset of the image. Here, a patch is a local rectangular image region that can be used for feature description in the following step. Compared to the original image, the sampled subset of patches is more compact and has less space complexity. This step is the core part of our work, which the comparative studies are devoted to, and is illustrated in more detail in the following part.
- -
- Feature learning: learning feature descriptions of the images using sampled patches. Each patch in the sampled set is first characterized by certain feature descriptors, such as scale-invariant feature transform (SIFT) [28] or histogram of gradients (HoG) [44], or other texture [45,46] descriptors. In our work, we use the most popular SIFT descriptor as the patch descriptor because of its invariance and robustness. Thus, the size of a patch is set to be pixels. By computing the histograms of gradient directions over a spatial grid and quantizing the gradient directions into eight bins, the final SIFT descriptor of a patch has dimensions of 128 (). Based on such a patch description, one can then learn a feature representation of the image, e.g., by using the BoVW model, topic models [8,47,48] or the recently-proposed unsupervised feature learning methods [10,13,14]. In our case, we use the BoVW for feature learning. More precisely, it first applies a k-means clustering algorithm to the descriptors of sampled patches to form a dictionary of visual words (with a size of K), and then, it uses hard assignment to quantify the descriptors of sampled patches against the dictionary to obtain a histogram-like global feature. In the experiment, K is empirically set to be 1000 for both datasets in k-means. The resulting K-dimension global feature vector is finally normalized as a characterization of an image. In addition, we adopt another feature learning method, the Fisher kernel [22], to validate the expandability of the conclusions drawn from the BoVW model. This model is a generic framework that combines the benefits of generative and discriminative approaches. It uses the Gaussian mixture model (GMM) to construct a visual word dictionary rather than k-means, and it describes an image using the Fisher vector-encoding method using mean and covariance deviation vectors. Because the Gaussian mixture model is adopted to fit every dimension of the local descriptors, for the adopted 128-dimensional SIFT descriptor, the dimension of the final image vector will be 256-times the dictionary size K. Thus, it can work well by using a considerably smaller dictionary size (e.g., K is empirically set to be 100 for both datasets in our work) than the BoVW model, and principle component analysis (PCA) is adopted for dimension reduction prior to classification.
- -
- Classification: assigning a semantic class label to each image. This step occurs after finding a feature representation of images and typically relies on some trained classifier, e.g., SVM. Because this part is beyond the scope of our study in this paper, in our experiments, we employ the commonly-used libSVM [49] as our classifier. For BoVW, we use the histogram intersection kernel [50], which is a very suitable non-linear kernel for measuring the similarity among histogram-based feature vectors. For the Fisher kernel, we adopt the radial basis function (RBF) kernel for its good performance. For both testing datasets, a unique ground truth label corresponding to every sample image exists. The labeling work is carefully performed by experts in this field. Therefore, the performances of supervised classification with different sampling methods are evaluated. Specifically, we randomly select a subset of images from the dataset for training to construct the classification model by SVM, and the remaining images are used for testing to measure the performance. This process is repeated 100 times, and the average classification accuracy and its standard deviation are computed. With the UC-Merced dataset, we randomly select of the samples from each class to train the classifier, which is a general setting in the literature, while for the RS19dataset, we randomly select for training.
3.2. Involved Sampling Strategies
Sampling Strategy | Method and Description |
---|---|
random | Random: randomly sample local patches in the spatial domain |
SIFT [28]: compute the DoG response map | |
Itti [40]: integrate color, intensity and orientation features across the scale to generate the saliency map | |
saliency based | AIM [41]: quantify Shannon’s self-information as the saliency measure |
SEG [42]: use a statistical framework and local feature contrast to compute the saliency measure | |
RCS [43]: compute saliency over random rectangular regions of interest |
3.2.1. Random Sampling
3.2.2. Saliency-Based Sampling
- -
- Scale-invariant feature transform (SIFT) detector [28]: This is the most widely-used keypoint detector due to its robustness and effectiveness. This algorithm actually computes the difference-of-Gaussian (DoG) response in scale space, and the informative points are those where the DoG responses take the local maximum/minimum; see Figure 2.
- -
- Itti’s model [40]: This model is among the early investigations on saliency computation. It is based on the “feature integration theory”, explaining human visual search strategies [51]. The flowchart of this model is presented in Figure 3. First, different low-level features, e.g., colors, intensities and orientations, are extracted independently of several spatial scales. Then, the center-surround differences, normalization and across-scale combination are sequentially operated on each feature to generate a conspicuity map, and the final saliency map is the sum of the three normalized conspicuity maps on the hypothesis that different features contribute independently to the saliency map.
- -
- Attention by information maximization (AIM) [41]: This model is rooted in information theory, where the saliency is determined by quantifying Shannon’s self-information of each local image patch. It consists of two stages: independent feature extraction and estimation of Shannon’s measure of self-information. In the first step, to analyze features independently, the independent coefficients corresponding to the contribution of different features are computed using independent component analysis (ICA). In the second step, the distribution of each basis coefficient across the entire image is estimated, and Shannon’s measure of self-information is finally computed from the joint distribution. The flowchart of this model is shown in Figure 4.
- -
- Salient segmentation (SEG) [42]: This uses a statistical framework and local feature contrast to define the saliency measure. Considering a rectangular window W composed of two disjoint parts, an inner window I and the border B, one can assume that the points in I are salient, while B belongs to the background. Define a random variable Z describing the distribution of pixels in W; the saliency measure of a point is thus defined to be the conditional probability,
- -
- Random center-surround (RCS) saliency [43]: This model is based on computing local saliency over random rectangular regions of interest; see Figure 6. Given an image with D channels, for each channel, n sub-windows are randomly generated with a uniform distribution. In the d-th channel, the local saliency of a point x is defined as the sum of the absolute differences between the pixel intensity and the mean intensity of the random sub-windows in which it is contained. The global saliency map is then computed by fusing the channel-specific saliency maps by a pixel-wise Euclidean norm. Furthermore, normalization, median filtering and histogram equalization are applied to the global saliency map to preserve edges, eliminate noise and enhance the contrast.
3.3. Evaluating the Sampling Performances
4. Testing Datasets
4.1. UC-Merced Dataset
4.2. RS19 Dataset
5. Experimental Results and Analysis
5.1. Overall Testing
- -
- Random sampling is obviously better than the other sampling methods, such as various saliency-based sampling methods, when the sampling ratio is low, primarily because it can extract balanced land cover information, which can help to improve the performance in scene classification.
- -
- As the sampling ratio become larger, the differences among the sampling methods become smaller because the information extracted becomes richer. Thus, all of the sampling methods provide comparable results.
- -
- In general, random sampling can provide comparable and even better performance regardless of the sampling ratio.
5.2. Testing on Single Texture Scenes
5.3. Testing on Multiple Texture Scenes
5.4. Testing on Object-Based Scenes
5.5. Testing on Structural Scenes
5.6. Validation Experiment Using the Fisher Kernel Approach
5.7. Discussion
Sampling Strategy | Random | SIFT | Itti | AIM | SEG | RCS |
---|---|---|---|---|---|---|
UC-Merced dataset ( pixels) | 0.002 s | 0.02 s | 0.2 s | 4 s | 8.2 s | 0.5 s |
RS19 dataset ( pixels) | 0.01 s | 0.02 s | 0.4 s | 28 s | 25 s | 2.3 s |
6. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279.
- Sheng, G.; Yang, W.; Xu, T.; Sun, H. High-resolution satellite scene classification using a sparse coding based multiple feature combination. Int. J. Remote Sens. 2012, 33, 2395–2412. [Google Scholar] [CrossRef]
- Zhao, B.; Zhong, Y.; Zhang, L. Hybrid generative/discriminative scene classification strategy based on latent dirichlet allocation for high spatial resolution remote sensing imagery. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Melbourne, VIC, Australia, 21–26 July 2013; pp. 196–199.
- Shao, W.; Yang, W.; Xia, G.S. Extreme value theory-based calibration for the fusion of multiple features in high-resolution satellite scene classification. Int. J. Remote Sens. 2013, 34, 8588–8602. [Google Scholar] [CrossRef]
- Hu, F.; Yang, W.; Chen, J.; Sun, H. Tile-level annotation of satellite images using multi-level max-margin discriminative random field. Remote Sens. 2013, 5, 2275–2291. [Google Scholar] [CrossRef]
- Shao, W.; Yang, W.; Xia, G.S.; Liu, G. A hierarchical scheme of multiple feature fusion for high-resolution satellite scene categorization. In Computer Vision Systems; Springer: Berlin, Germany, 2013; pp. 324–333. [Google Scholar]
- Sridharan, H.; Cheriyadat, A. Bag of Lines (BoL) for Improved Aerial Scene Representation. IEEE Trans. Geosci. Remote Sens. Lett. 2014, 12, 676–680. [Google Scholar] [CrossRef]
- Kusumaningrum, R.; Wei, H.; Manurung, R.; Murni, A. Integrated visual vocabulary in latent Dirichlet allocation-based scene classification for IKONOS image. J. Appl. Remote Sens. 2014. [Google Scholar] [CrossRef]
- Zhao, L.; Tang, P.; Huo, L. A 2-D wavelet decomposition-based bag-of-visual-words model for land-use scene classification. Int. J. Remote Sens. 2014, 35, 2296–2310. [Google Scholar]
- Cheriyadat, A.M. Unsupervised feature learning for aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2014, 52, 439–451. [Google Scholar] [CrossRef]
- Dos Santos, J.; Penatti, O.; Gosselin, P.H.; Falcao, A.X.; Philipp-Foliguet, S.; Torres, D.S. Efficient and effective hierarchical feature propagation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4632–4643. [Google Scholar] [CrossRef]
- Chen, S.; Tian, Y. Pyramid of Spatial Relatons for Scene-Level Land Use Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1947–1957. [Google Scholar] [CrossRef]
- Zhang, F.; Du, B.; Zhang, L. Saliency-Guided Unsupervised Feature Learning for Scene Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2175–2184. [Google Scholar] [CrossRef]
- Hu, F.; Xia, G.S.; Wang, Z.; Huang, X.; Zhang, L.; Sun, H. Unsupervised Feature Learning Via Spectral Clustering of Multidimensional Patches for Remotely Sensed Scene Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2015–2030. [Google Scholar] [CrossRef]
- Penatti, O.A.B.; Nogueira, K.; dos Santos, J.A. Do Deep Features Generalize From Everyday Objects to Remote Sensing and Aerial Scenes Domains? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
- Yang, W.; Yin, X.; Xia, G.S. Learning High-level Features for Satellite Image Classification with Limited Labeled Samples. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4472–4482. [Google Scholar] [CrossRef]
- Hu, F.; Wang, Z.; Xia, G.S.; Zhang, L. Fast binary coding for satellite image scene classification. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Mailan, Italy, 26–31 July 2015.
- Fei-Fei, L.; Perona, P. A bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; Volume 2, pp. 524–531.
- Nowak, E.; Jurie, F.; Triggs, B. Sampling strategies for bag-of-features image classification. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 490–503.
- Payne, A.; Singh, S. A benchmark for indoor/outdoor scene classification. In Pattern Recognition and Image Analysis; Springer: Berlin, Germany, 2005; pp. 711–718. [Google Scholar]
- Xiao, J.; Hays, J.; Ehinger, K.; Oliva, A.; Torralba, A. Sun database: Large-scale scene recognition from abbey to zoo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3485–3492.
- Perronnin, F.; Sánchez, J.; Mensink, T. Improving the Fisher Kernel for Large-scale Image Classification. In Proceedings of the European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; pp. 143–156.
- Lazebnik, S.; Schmid, C.; Ponce, J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; Volume 2, pp. 2169–2178.
- Bosch, A.; Zisserman, A.; Muñoz, X. Scene classification via pLSA. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 517–530.
- Bosch, A.; Zisserman, A.; Muñoz, X. Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 712–727. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Yu, K.; Gong, Y.; Huang, T. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1794–1801.
- Mikolajczyk, K.; Schmid, C. Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 2004, 60, 63–86. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Xia, G.S.; Delon, J.; Gousseau, Y. Accurate junction detection and characterization in natural images. Int. J. Comput. Vis. 2014, 106, 31–56. [Google Scholar] [CrossRef]
- Sivic, J.; Zisserman, A. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; pp. 1470–1477.
- Lazebnik, S.; Schmid, C.; Ponce, J. A sparse texture representation using affine-invariant regions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Madison, WI, USA, 18–20 June 2003; Volume 2, pp. II-319–II-324.
- Csurka, G.; Dance, C.; Fan, L.; Willamowski, J.; Bray, C. Visual categorization with bags of keypoints. In Proceedings of the European Conference on Computer Vision, Prague, Czech Republic, 11–14 May 2004; Volume 1, pp. 1–2.
- Yang, J.; Jiang, Y.G.; Hauptmann, A.G.; Ngo, C.W. Evaluating bag-of-visual-words representations in scene classification. In Proceedings of 9th ACM SIGMM International Workshop on Multimedia Information Retrieval, Augsburg, Germany, 24–29 September 2007; pp. 197–206.
- Gokalp, D.; Aksoy, S. Scene classification using bag-of-regions representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8.
- Jurie, F.; Triggs, B. Creating efficient codebooks for visual recognition. In Proceedings of the IEEE International Conference on Computer Vision, Beijing, China, 17–21 October 2005; Volume 1, pp. 604–610.
- Winn, J.; Criminisi, A.; Minka, T. Object categorization by learned universal visual dictionary. In Proceedings of the IEEE International Conference on Computer Vision, Beijing, China, 17–21 October 2005; Volume 2, pp. 1800–1807.
- Siagian, C.; Itti, L. Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 300–312. [Google Scholar] [CrossRef] [PubMed]
- Borji, A.; Itti, L. Scene classification with a sparse set of salient regions. In Proceedings of the IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 1902–1908.
- Sharma, G.; Jurie, F.; Schmid, C. Discriminative spatial saliency for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3506–3513.
- Itti, L.; Koch, C.; Niebur, E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [Google Scholar] [CrossRef]
- Bruce, N.; Tsotsos, J. Saliency based on information maximization. In Proceedings of the 18th Advances in Neural Information Processing Systems, Vancouver, Canada, 5–8 December 2005; pp. 155–162.
- Rahtu, E.; Kannala, J.; Salo, M.; Heikkilä, J. Segmenting salient objects from images and videos. In Proceedings of the European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; pp. 366–379.
- Vikram, T.N.; Tscherepanow, M.; Wrede, B. A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognit. 2012, 45, 3114–3124. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 25 June 2005; Volume 1, pp. 886–893.
- Xia, G.S.; Delon, J.; Gousseau, Y. Shape-based invariant texture indexing. Int. J. Comput. Vis. 2010, 88, 382–403. [Google Scholar] [CrossRef]
- Liu, G.; Xia, G.S.; Yang, W.; Zhang, L. Texture Analysis with Shape Co-occurrence Patterns. In Proceedings of the International Conference on Pattern Recognition, Stockholm, Switzerland, 24-28 August 2014; pp. 1627–1632.
- Hofmann, T. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 2001, 42, 177–196. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011. [Google Scholar] [CrossRef]
- Maji, S.; Berg, A.; Malik, J. Classification using intersection kernel support vector machines is efficient. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8.
- Treisman, A.M.; Gelade, G. A feature-integration theory of attention. Cognit. Psychol. 1980, 12, 97–136. [Google Scholar] [CrossRef]
- Hu, Q.; Wu, W.; Xia, T.; Yu, Q.; Yang, P.; Li, Z.; Song, Q. Exploring the use of Google Earth imagery and object-based methods in land use/cover mapping. Remote Sens. 2013, 5, 6026–6042. [Google Scholar] [CrossRef]
- Xia, G.S.; Yang, W.; Delon, J.; Gousseau, Y.; Sun, H.; Maître, H. Structural high-resolution satellite image indexing. In Proceedings of the ISPRS TC VII Symposium-100 Years ISPRS, Vienna, Austria, 5–7 July 2010; Volume 38, pp. 298–303.
© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, J.; Xia, G.-S.; Hu, F.; Zhang, L. A Comparative Study of Sampling Analysis in the Scene Classification of Optical High-Spatial Resolution Remote Sensing Imagery. Remote Sens. 2015, 7, 14988-15013. https://doi.org/10.3390/rs71114988
Hu J, Xia G-S, Hu F, Zhang L. A Comparative Study of Sampling Analysis in the Scene Classification of Optical High-Spatial Resolution Remote Sensing Imagery. Remote Sensing. 2015; 7(11):14988-15013. https://doi.org/10.3390/rs71114988
Chicago/Turabian StyleHu, Jingwen, Gui-Song Xia, Fan Hu, and Liangpei Zhang. 2015. "A Comparative Study of Sampling Analysis in the Scene Classification of Optical High-Spatial Resolution Remote Sensing Imagery" Remote Sensing 7, no. 11: 14988-15013. https://doi.org/10.3390/rs71114988
APA StyleHu, J., Xia, G. -S., Hu, F., & Zhang, L. (2015). A Comparative Study of Sampling Analysis in the Scene Classification of Optical High-Spatial Resolution Remote Sensing Imagery. Remote Sensing, 7(11), 14988-15013. https://doi.org/10.3390/rs71114988