Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Using a Similarity Matrix Approach to Evaluate the Accuracy of Rescaled Maps

Remote Sensing, 2018
...Read more
remote sensing Article Using a Similarity Matrix Approach to Evaluate the Accuracy of Rescaled Maps Peijun Sun 1,2, * ID and Russell G. Congalton 3 ID 1 State Key Laboratory of Remote Sensing Science, Jointly Sponsored by Beijing Normal University and Institute of Remote Sensing and Digital Earth of Chinese Academy of Sciences, Beijing 100875, China 2 Institute of Remote Sensing Science and Engineering, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China 3 Department of Natural Resources & the Environment, University of New Hampshire, Durham, NH 03824, USA; russ.congalton@unh.edu * Correspondence: sunpeijun@mail.bnu.edu.cn; Tel.: +86-18612945710 Received: 1 February 2018; Accepted: 16 March 2018; Published: 20 March 2018 Abstract: Rescaled maps have been extensively utilized to provide data at the appropriate spatial resolution for use in various Earth science models. However, a simple and easy way to evaluate these rescaled maps has not been developed. We propose a similarity matrix approach using a contingency table to compute three measures: overall similarity (OS), omission error (OE), and commission error (CE) to evaluate the rescaled maps. The Majority Rule Based aggregation (MRB) method was employed to produce the upscaled maps to demonstrate this approach. In addition, previously created, coarser resolution land cover maps from other research projects were also available for comparison. The question of which is better, a map initially produced at coarse resolution or a fine resolution map rescaled to a coarse resolution, has not been quantitatively investigated. To address these issues, we selected study sites at three different extent levels. First, we selected twelve regions covering the continental USA, then we selected nine states (from the whole continental USA), and finally we selected nine Agriculture Statistical Districts (ASDs) (from within the nine selected states) as study sites. Crop/non-crop maps derived from the USDA Crop Data Layer (CDL) at 30 m as base maps were used for the upscaling and existing maps at 250 m and 1 km were utilized for the comparison. The results showed that a similarity matrix can effectively provide the map user with the information needed to assess the rescaling. Additionally, the upscaled maps can provide higher accuracy and better represent landscape pattern compared to the existing coarser maps. Therefore, we strongly recommend that an evaluation of the upscaled map and the existing coarser resolution map using a similarity matrix should be conducted before deciding which dataset to use for the modelling. Overall, extending our understanding on how to perform an evaluation of the rescaled map and investigation of the applicability of the rescaled map compared to the existing land cover map is necessary for users to most effectively use these data in Earth science models. Keywords: similarity matrix; accuracy assessment; rescaling technique; land cover map; upscaled map; heterogeneity 1. Introduction Human activity increasingly impacts the Earth’s environment and ecosystems [1,2]. To characterize the Earth’s fundamental characteristics and environmental processes, maps have been created for providing information about the distribution and dynamic changes of the land cover. This map information has served as input for a great variety of models to better understand the earth including climate change [3], biogeochemical cycling [4], and carbon cycling [5], to name but Remote Sens. 2018, 10, 487; doi:10.3390/rs10030487 www.mdpi.com/journal/remotesensing
Remote Sens. 2018, 10, 487 2 of 21 a few. Therefore, characterizing the distribution of the land cover is crucial for Earth science and decision making. To efficiently and effectively characterize the spatial distribution of land cover, remotely sensed imagery and associated technologies are employed for producing land cover maps at different spatial resolutions (i.e., raster/pixel sizes). As Grekousis et al. [6] reported, 23 global and 41 regional land cover products based on remote sensing technologies have been produced and released for use by the Earth science community. For example, the United States Department of Agriculture (USDA), National Agriculture Statistic Service (NASS) annually produces the Cropland Data Layer (CDL) at 30 m resolution for U.S.A. [7]. Other existing maps provide essential information at different spatial resolutions including 1 km, 500 m, 300 m, 250 m, and 30 m [8], and have continued to improve in detail (spatial resolution) over time. However, in addition to these existing maps at specific spatial resolutions, other spatial resolutions may also be required to obtain optimal results for various Earth science models. Previous research has revealed that model outputs have varied considerably when using land cover maps of different spatial resolutions. For example, Buyantuyev and Wu [9] demonstrated that using land cover maps of different spatial resolutions resulted in variation of estimated land surface temperature. Grafius et al. [10] stated that the outputs of their ecosystem service models varied due to the use of land cover maps with differing spatial resolutions. Many researchers have emphasized that often the existing spatial resolutions cannot meet the requirements needed to obtain the optimal results for various Earth science models. In other words, the modeler would rather rescale a finer resolution map to the desired spatial resolution, if possible, then to use an existing coarse scale map. Recognizing the need for various spatial resolutions by the Earth science community, rescaling techniques have been developed to fill the resolution gaps for existing land cover maps [11]. These rescaling techniques consist of two possible outcomes: upscaling and downscaling. The upscaling method decreases the spatial resolution while the downscaling method increases the spatial resolution [12]. These two methods have been documented as vitally necessary and efficient ways to obtain maps at different spatial resolutions to support a variety of Earth science research [11,13,14]. Upscaling produces coarser resolution maps from finer resolution maps using two different approaches: numerical aggregation and categorical aggregation. Numerical aggregation first computes the pixel value (e.g., digital number, DN) for the coarser resolution imagery based on some function between the finer resolution pixels and the coarser resolution pixels [15]. Then, the coarser resolution imagery is classified to produce the upscaled map. To conduct numerical aggregation, mean aggregation and central pixel resampling (CPR) are commonly employed. The mean aggregation algorithm calculates the average of the pixel values from the finer resolution imagery as the pixel value for the coarser resolution map [15]. The CPR algorithm selects the finer resolution image’s central pixel value corresponding to the coarser resolution pixel as the output pixel value [16]. The other common way to obtain upscaled maps, categorical aggregation, assigns a class label to an upscaled map based on class labels in the associated finer resolution map by using different aggregation logic. The choice of aggregation logic includes: Majority Rule Based (MRB) (e.g., [17]), Random Rule Based (RRB) (e.g., [18]) and Point-centered Distance-Weighted moving window (PDW) [14]. MRB determines the class label for an upscaled map by selecting the most frequently occurring class of the fine resolution map. RRB randomly selects a class from the specific pixels of the fine resolution map. PDW first employs a weighted sampling grid of variable resolution to sample the existing map. Then, the class label for the corresponding location in the upscaled map is randomly selected from the frequency of the class labels derived from the samples. Downscaling generates finer resolution maps from a coarser resolution map using either a numerical or categorical approach. Similar to the upscaling methods, the numerical approach first predicts finer resolution images that are then classified to obtain the downscaled map. The categorical approach directly transforms the categorical variable, usually representing the land cover class, from the coarser resolution map to the corresponding finer resolution map [13]. One common
remote sensing Article Using a Similarity Matrix Approach to Evaluate the Accuracy of Rescaled Maps Peijun Sun 1,2, * 1 2 3 * ID and Russell G. Congalton 3 ID State Key Laboratory of Remote Sensing Science, Jointly Sponsored by Beijing Normal University and Institute of Remote Sensing and Digital Earth of Chinese Academy of Sciences, Beijing 100875, China Institute of Remote Sensing Science and Engineering, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China Department of Natural Resources & the Environment, University of New Hampshire, Durham, NH 03824, USA; russ.congalton@unh.edu Correspondence: sunpeijun@mail.bnu.edu.cn; Tel.: +86-18612945710 Received: 1 February 2018; Accepted: 16 March 2018; Published: 20 March 2018 Abstract: Rescaled maps have been extensively utilized to provide data at the appropriate spatial resolution for use in various Earth science models. However, a simple and easy way to evaluate these rescaled maps has not been developed. We propose a similarity matrix approach using a contingency table to compute three measures: overall similarity (OS), omission error (OE), and commission error (CE) to evaluate the rescaled maps. The Majority Rule Based aggregation (MRB) method was employed to produce the upscaled maps to demonstrate this approach. In addition, previously created, coarser resolution land cover maps from other research projects were also available for comparison. The question of which is better, a map initially produced at coarse resolution or a fine resolution map rescaled to a coarse resolution, has not been quantitatively investigated. To address these issues, we selected study sites at three different extent levels. First, we selected twelve regions covering the continental USA, then we selected nine states (from the whole continental USA), and finally we selected nine Agriculture Statistical Districts (ASDs) (from within the nine selected states) as study sites. Crop/non-crop maps derived from the USDA Crop Data Layer (CDL) at 30 m as base maps were used for the upscaling and existing maps at 250 m and 1 km were utilized for the comparison. The results showed that a similarity matrix can effectively provide the map user with the information needed to assess the rescaling. Additionally, the upscaled maps can provide higher accuracy and better represent landscape pattern compared to the existing coarser maps. Therefore, we strongly recommend that an evaluation of the upscaled map and the existing coarser resolution map using a similarity matrix should be conducted before deciding which dataset to use for the modelling. Overall, extending our understanding on how to perform an evaluation of the rescaled map and investigation of the applicability of the rescaled map compared to the existing land cover map is necessary for users to most effectively use these data in Earth science models. Keywords: similarity matrix; accuracy assessment; rescaling technique; land cover map; upscaled map; heterogeneity 1. Introduction Human activity increasingly impacts the Earth’s environment and ecosystems [1,2]. To characterize the Earth’s fundamental characteristics and environmental processes, maps have been created for providing information about the distribution and dynamic changes of the land cover. This map information has served as input for a great variety of models to better understand the earth including climate change [3], biogeochemical cycling [4], and carbon cycling [5], to name but Remote Sens. 2018, 10, 487; doi:10.3390/rs10030487 www.mdpi.com/journal/remotesensing Remote Sens. 2018, 10, 487 2 of 21 a few. Therefore, characterizing the distribution of the land cover is crucial for Earth science and decision making. To efficiently and effectively characterize the spatial distribution of land cover, remotely sensed imagery and associated technologies are employed for producing land cover maps at different spatial resolutions (i.e., raster/pixel sizes). As Grekousis et al. [6] reported, 23 global and 41 regional land cover products based on remote sensing technologies have been produced and released for use by the Earth science community. For example, the United States Department of Agriculture (USDA), National Agriculture Statistic Service (NASS) annually produces the Cropland Data Layer (CDL) at 30 m resolution for U.S.A. [7]. Other existing maps provide essential information at different spatial resolutions including 1 km, 500 m, 300 m, 250 m, and 30 m [8], and have continued to improve in detail (spatial resolution) over time. However, in addition to these existing maps at specific spatial resolutions, other spatial resolutions may also be required to obtain optimal results for various Earth science models. Previous research has revealed that model outputs have varied considerably when using land cover maps of different spatial resolutions. For example, Buyantuyev and Wu [9] demonstrated that using land cover maps of different spatial resolutions resulted in variation of estimated land surface temperature. Grafius et al. [10] stated that the outputs of their ecosystem service models varied due to the use of land cover maps with differing spatial resolutions. Many researchers have emphasized that often the existing spatial resolutions cannot meet the requirements needed to obtain the optimal results for various Earth science models. In other words, the modeler would rather rescale a finer resolution map to the desired spatial resolution, if possible, then to use an existing coarse scale map. Recognizing the need for various spatial resolutions by the Earth science community, rescaling techniques have been developed to fill the resolution gaps for existing land cover maps [11]. These rescaling techniques consist of two possible outcomes: upscaling and downscaling. The upscaling method decreases the spatial resolution while the downscaling method increases the spatial resolution [12]. These two methods have been documented as vitally necessary and efficient ways to obtain maps at different spatial resolutions to support a variety of Earth science research [11,13,14]. Upscaling produces coarser resolution maps from finer resolution maps using two different approaches: numerical aggregation and categorical aggregation. Numerical aggregation first computes the pixel value (e.g., digital number, DN) for the coarser resolution imagery based on some function between the finer resolution pixels and the coarser resolution pixels [15]. Then, the coarser resolution imagery is classified to produce the upscaled map. To conduct numerical aggregation, mean aggregation and central pixel resampling (CPR) are commonly employed. The mean aggregation algorithm calculates the average of the pixel values from the finer resolution imagery as the pixel value for the coarser resolution map [15]. The CPR algorithm selects the finer resolution image’s central pixel value corresponding to the coarser resolution pixel as the output pixel value [16]. The other common way to obtain upscaled maps, categorical aggregation, assigns a class label to an upscaled map based on class labels in the associated finer resolution map by using different aggregation logic. The choice of aggregation logic includes: Majority Rule Based (MRB) (e.g., [17]), Random Rule Based (RRB) (e.g., [18]) and Point-centered Distance-Weighted moving window (PDW) [14]. MRB determines the class label for an upscaled map by selecting the most frequently occurring class of the fine resolution map. RRB randomly selects a class from the specific pixels of the fine resolution map. PDW first employs a weighted sampling grid of variable resolution to sample the existing map. Then, the class label for the corresponding location in the upscaled map is randomly selected from the frequency of the class labels derived from the samples. Downscaling generates finer resolution maps from a coarser resolution map using either a numerical or categorical approach. Similar to the upscaling methods, the numerical approach first predicts finer resolution images that are then classified to obtain the downscaled map. The categorical approach directly transforms the categorical variable, usually representing the land cover class, from the coarser resolution map to the corresponding finer resolution map [13]. One common Remote Sens. 2018, 10, 487 3 of 21 method of downscaling is area-to-point prediction (ATPP). The ATPP approach interpolates the coarser resolution imagery to generate the finer resolution imagery by integrating support information from multiple sources of data. A variety of complex algorithms have been developed to conduct the ATPP such as area-point-kriging (ATPK) [19], area-to-point cokriging (ATPCK) [12], etc. For example, ATPCK incorporates the support information from imagery of different resolutions into a random field model to predict the finer resolution. The correlation within and cross-correlation between the imagery of different resolutions are used to minimize the variance of the error [12]. More details about ATPP can be found in Tang et al. [12] and Atkinson [13]. The categorical method for downscaling is frequently conducted by employing super-resolution mapping (SRM) algorithms [13]. The SRM divides a coarser resolution pixel into several finer resolution pixels and determines the class label for the finer resolution pixels by translating the coarse resolution fractions into a finer resolution land cover map [20,21]. Existing approaches produce a hard-classified land cover map at finer resolution typically choosing one class label based on the maximum a posteriori (MAP) principle. This principle is similar to that of many hard classification algorithms that assign the class label with the highest posteriori probability as the class type for the output classified map [21]. For example, Ling et al. [21] fused the class membership (CMP) of the finer resolution pixel derived by using the local smoothness priority (LSP) and the downscaled coarse fraction (DCF), respectively. The CMP is on the assumption that one pixel belongs to one of the predefined land cover classes, which represents the spatial distribution information of classes. The LSP represents the relationship between neighboring fine resolution pixels, while the DCF considers the fractional values between neighboring coarse resolution pixels. These SRM algorithms have been successfully conducted to generate finer resolution maps. Since rescaling techniques have been well documented as necessary and effective approaches to produce land cover maps at a wide range of spatial resolutions (e.g., [14,22]), how to evaluate these upscaled or downscaled maps is becoming a critical issue. Although existing mapping projects typically have reference data that were used to assess the accuracy of that map, utilization of these reference data for validating any rescaled maps created by others is problematic. Often, the reference data are not readily available to the map users. In addition, the reference data do not match the spatial resolution of either an upscaled or downscaled map. Therefore, a simple and easy way to evaluate the accuracy of rescaled maps needs to be developed. Additionally, producing rescaled maps draws into question whether the rescaled maps are better than the existing land cover maps for use in Earth science modeling. If so, then methods for evaluating the rescaled maps will be beneficial for the community to make better decisions for selecting the appropriate datasets. Therefore, the primary objective of this paper is to present a new method using a similarity matrix based on a contingency table to evaluate the accuracy of the rescaled maps. This method will also allow testing to see if existing, coarse resolution maps have a higher or lower accuracy than rescaled maps derived from existing fine resolution maps. Finally, testing if the extent (i.e., size) of the study sites has an influence on the accuracy of rescaled maps has not been quantitatively performed. Thus, the experiments done in this research vary the study area sizes over three different extent levels. All the analysis in this research was conducted using upscaled crop/non-crop maps derived by the MRB. 2. Materials and Methods In this section, we describe the selection of data and study areas, the principles behind and the calculation of the similarity matrix, the design of our experiments using study areas of three different spatial extents, and the techniques for producing the upscaled maps. The methods consist of (1) selection of two datasets: the base map and the existing coarser resolution map; (2) selection of the study areas with different landscape patterns and different extents (i.e., size of area); (3) the similarity matrix approach proposed to evaluate either the rescaled maps or the existing, coarser resolution maps compared to the finer resolution maps; and (4) the MRB aggregation method that was used to produce Remote Sens. 2018, 10, 487 4 of 21 the upscaled maps to test the performance of similarity matrix and explore if the upscaled maps are better than the existing coarser resolution maps. 2.1. Data Descriptions Based on the objectives of this paper, two datasets were required. The first dataset, NASS’s CDL data were utilized to generate the crop/non-crop base maps that were used to produce the upscaled maps. The crop/non-crop base maps were also used as reference maps for assessing the existing coarser resolution maps (i.e., Global Food Security-Support Analysis Data, GFSAD). The CDL data can be obtained from the NASS’s online geospatial application-CropScape (https://nassgeodata.gmu. edu/CropScape/). These data are annually updated, raster-formatted, and geo-referenced land cover maps [23], whose accuracies for major crops range from 85% to 95% [24]. Due to its high accuracy for predicting the information of crops and identifying the field crops, the CDL data are extensively applied in Earth science research [7,25]. Therefore, this paper utilized the CDL data at 30 m spatial resolution for the year 2010 to generate the crop/non-crop maps for each of the different study sites. The crop/non-crop maps were extracted by simplifying the crop type of CDL data based on the metadata provided by NASS (available online: https://www.nass.usda.gov/Research_and_Science/ Cropland/metadata/meta.php). The other dataset needed in this study was the existing crop/non-crop maps at specific spatial resolutions that could then be compared with the upscaled maps. The GFSAD Crop Dominance Global 1 km and 250 m for nominal year 2010 were obtained from the Land Processes Distributed Active Archive Center (LP DAAC) provided by the National Aeronautics and Space Administration (NASA) (available from: https://lpdaac.usgs.gov/dataset_discovery/measures/measures_products_ table/gfsad1kcd_v001). The GFSAD crop/non-crop (cropland extent) maps were produced as part of a NASA project evaluating global food security in the twenty-first century. More details regarding GFSAD can be found in Teluguntla et al. [26] and Teluguntla et al. [27]. Note that GFSAD data at 30 m resolution were not used to produce the base maps in this study because maps for the year 2010 are not available. To facilitate comparison of the upscaled maps created in this study and the existing GFSAD maps, the GFSAD maps at 1 km and 250 m were resampled to 960 m and 240 m, respectively. ArcGIS version 10.4 was used to resample the GFSAD maps using the majority resampling technique. This technique determines the corresponding 4 by 4 cells in the input space that are closest to the center of the output cell and uses the majority of the 4 by 4 neighbors (http://pro.arcgis.com/en/pro-app/tool-reference/data-management/resample.htm). 2.2. Study Areas To analyze the influence of the size of the area on the upscaling accuracy, this research employed study sites for three different extents: regional level maps (a portion of the USA), US state level maps, and Agricultural Statistical District (ASD) level maps. The boundaries of the regions and the ASDs are defined by the USDA’s NASS. The U.S.A. has been divided into twelve regions (Figure 1). Each ASD is a group of counties based on the geography, climate, and cropping practice within each state [24], and has been widely-used in agricultural research (e.g., [11]). To evaluate the landscape pattern of each of the study sites, the Patch-Per-Unit area (PPU) (Equation (1)), a measure of the heterogeneity [28], was employed to compute the heterogeneity for each study site. As the landscape becomes heterogeneous, PPU increases [28]. The crop/non-crop base maps derived from NASS’s CDL were utilized to compute the PPU for each state study site. The highest PPU for all states was 134.97, while the lowest was 4.72. The PPU (heterogeneity of each state) was then divided into three levels: lower level (0 < PPU ≤ 40), medium level (40 < PPU ≤ 100), and higher level (PPU >100). Three states were then selected for analysis from each of the three heterogeneity levels. PPU = m/(n × λ), (1) Remote Sens. 2018, 10, 487 5 of 21 where m is the total number of patches, n is the total number of pixels, λ is a scaling constant equal to the area of a pixel. The PPU of each of the ASDs in the nine selected states were then calculated and divided into three different levels: lower level (0 < PPU ≤ 50), medium level (50 < PPU ≤ 120), and higher level (PPU >120). For each level of heterogeneity, three ASDs were randomly selected for analysis in this study. Therefore, we selected a total of twelve regions, nine states, and nine ASDs to be analyzed in this study. The landscape characteristics of these regions, states, and ASDs are shown in Table 1. Figure 1. Study sites. Note that the region-level study sites consist of all twelve regions. The state-level study sites consist of nine states. The Agriculture Statistic District (ASD)-level study sites consist of nine ASDs. The number in ASD level study area represents the identify number of the ASD for marking different ASDs. For example, 2720 in the Figure 1 represents the study site of ASD2720. Table 1. Landscape pattern for study sites at different levels. TA, NP and PLAND, respectively, represents the total area, the number of fields, and the proportion of crop in the landscape scenarios. Patch-Per-Unit area (PPU) measures the fragmentation of the study sites. Note that PPU were performed in kilometer square (km2 ) in this paper. TA (km2 ) NP PLAND Mean Size of Fields (km2 ) PPU Regions Heartland Mountain Northern Plains Northwest Pacific Upper Midwest Delta Eastern Mountain Great Lakes Northeastern Southern Southern Plains 326,626.3 1,734,205.4 796,517.3 642,992.8 696,249.9 509,763.7 382,528.1 508,588.4 351,437.4 464,858.1 513,624.9 866,793.2 178,538 238,082 479,110 82,452 64,637 368,148 109,111 334,785 249,684 278,917 202,950 219,485 36.44 6.93 38.72 8.46 5.50 39.35 17.81 13.68 33.18 14.50 9.35 13.35 1.83 7.28 1.66 7.80 10.77 1.38 3.50 1.51 1.40 1.67 2.53 3.95 54.66 13.73 60.15 12.82 9.28 72.22 28.52 65.83 71.05 60.00 39.51 25.32 States Arizona California Kentucky Maine Maryland Michigan Minnesota Nevada North Carolina 295,386.1 409,884.8 104,780.0 84,430.3 26,193.8 150,745.8 218,696.7 286,485.4 128,187.2 36,500 89,759 118,632 34,942 34,777 139,808 163,526 13,522 140,766 2.89 8.59 16.09 3.93 29.46 23.31 33.15 1.09 15.78 8.09 4.57 0.88 2.41 0.75 1.07 1.34 21.18 0.91 12.36 21.90 113.22 41.39 132.77 92.74 74.77 4.72 109.81 Remote Sens. 2018, 10, 487 6 of 21 Table 1. Cont. 0640 0651 0680 2420 2490 2670 2680 2720 3710 ASDs 41,552.6 71,221.8 118,264.7 9077.0 4582.1 11,901.4 16,872.1 37,161.4 8652.0 10,371 41,474 10,448 14,788 3724 17,895 28,429 4261 10,074 3.82 27.60 2.59 33.54 28.70 36.60 52.40 1.52 8.72 4.01 1.72 11.32 0.61 1.23 0.67 0.59 8.72 0.85 24.96 58.23 8.83 162.92 81.27 150.36 168.50 11.47 116.44 2.3. Description of the Similarity Matrix The similarity matrix proposed here to evaluate the rescaling is based on a contingency table approach as shown in Table 2. Each row represents the area of each class in the upscaled map, while each column represents the area of each class in the base map. Aik denotes the area that is classified as class i in the base map but as class k in the upscaled map. Note that class k can be the same or different with class i. Entries on the diagonal represent the consistent area (i.e., similarity) for each class between the base map and the upscaled map. Thus, Aii shows how many areas for class i are correctly identified in the upscaled map. A+k sums the area of the class k in the upscaled map (the column total), and A+i sums the area of the class i in the base map (the row total). The omission error (OE) for class i of the upscaled map can be evaluated by Equation (2), which represents the percentage of area for class i that is omitted from the upscaled map. The OE of class i also represents the underestimation of class i in the upscaled map. The commission error (CE) for class i can be evaluated by Equation (3), which represents the percentage of area for class i that is committed from the correct base map. The CE of class i represents the overestimation of the class i in the upscaled map. The overall similarity (OS) can be evaluated by Equation (4), which represents the percentage of the area for all classes that are correctly represented in the upscaled maps. ( Ai+ − Aii )/Ai+ (2) ( A+i − Aii )/A+i (3) n n n n i =0 i =0 i =0 i =0 ∑ Aii / ∑ Ai+ = ∑ Aii / ∑ A+i (4) Table 2. Similarity matrix to assess the accuracy of the upscaled maps. Upscaled Map Base map Class 1 Class 2 ... Class i ... Class k Total Class 1 Class 2 Class i Class k Total A11 A21 ... ... ... Ak1 A+1 A12 A22 ... ... ... Ak2 A+2 A1i A2i ... Aii ... Aki A+i A1k A2k ... ... ... Akk A+k A1+ A2+ ... Ai+ ... Ak+ 2.4. Calculation of Similarity Matrix To generate the similarity matrix, five steps should be taken: 1. Compute the total number of the square windows (NSW ) to cover the base map. First, determine the total number of rows, M, and the total number of columns, N, for the upscaled map based on the desired spatial resolution (pixel size). Then, NSW is computed by NSW = M × N. Remote Sens. 2018, 10, 487 2. 3. 4. 5. 7 of 21 Place NSW square windows (corresponding to the upscaled pixel size) over the base map. The square window, W mn , corresponds to the pixel at row m and column n in the upscaled map. Identify the class type of the upscaled pixel corresponding to W mn , which is denoted as class j (the entire upscaled pixel is class j). Then, identify the class types of the base map pixels within W mn , which is denoted as class i. Calculate the area of class i of the base map pixels within W mn . This area is denoted as Aijmn . Note that class i can be either the same or different than class j from the upscaled map. The area for class i in the base map that is classified as class j in the upscaled map, Aij , can be computed Aijmn , which is used to fill in the values in the contingency table to obtain by Aij = ∑ m = 1, M n = 1, N the similarity matrix. The OS, OE, and CE can be obtained to evaluate the accuracy of the upscaled map according to the principles in the Section 2.3. Figure 2 presents an example of generating the similarity matrix and will make this method much clearer. The base map covers an area of 36 m2 and has a spatial resolution (pixel size) of 1 m (resulting in 36 pixels in this study area). There are two classes in this map; Class 1 (C1) is the gray color and Class 2 (C2) is the white color in both the base and upscaled maps. The desired spatial resolution of the upscaled map is 3 m. Step 1: The number of square windows (NSW ) in the upscaled map is 4 (the number of rows is M = 2 and the number of columns is N = 2; 2 × 2 = 4). Step 2: The computed windows are placed over the base map. The size of each square window is same as the spatial resolution (pixel size = 3 m) of the upscaled map. Step 3: The class type of W 11 is C1 for the upscaled map as determined using the MRB upscaling method. The pixels within W 11 in the base map are marked as P1, P2, P3, P4, P5, P6, P7, P8, and P9 in Figure 2. The corresponding map classes are C1, C1, C2, C2, C1, C1, C1, C2, and C2, respectively. 11 11 11 Step 4: Compute Aijmn for W 11 . Then, A11 11 = 5, A21 = 4, A12 = 0, and A22 = 0. mn A11 = 12. Thus, A11 = ∑ m = 1, 2 n = 1, 2 12 12 Repeat step 3 and step 4 to obtain Aij for W 12 , W 21 , W 22 . For W 12 , A12 11 = 0, A21 = 0, A12 = 1, 21 12 12 21 21 22 12 12 22 = 8. For W , A11 = 0, A21 = 0, A12 = 2, A22 = 7. For W , A11 = 7, A21 = 2, A12 = 0, A12 22 = 0. mn = 15. mn = 3, and A = mn = 6, A = A A A Then A21 = ∑ ∑ ∑ 22 12 22 12 21 m = 1, 2 m = 1, 2 m = 1, 2 n = 1, 2 n = 1, 2 n = 1, 2 Step 5: Table 3 shows the results of similarity matrix for this example. Based on the similarity matrix, OE, CE and OS can be computed. The OS for this example is 75.0%. The CE and OE for this example can be found in Table 3. A12 22 Table 3. Similarity matrix for the example in Figure 2. OS, OE, and CE represent overall similarity, omission error, and commission error, respectively. Upscaled Map Base map C1 C2 Total OE (%) C1 C2 12 6 3 15 15 21 20.0 28.6 Total 18 18 CE (%) 33.3 16.7 OS = 75.0%         Remote Sens. 2018, 10, 487                    8 of 21 Figure 2. A general scheme for implementation of similarity matrix. Class 1 is gray and Class 2 is white. 2.5. Design of the Experiment To investigate the performance of the similarity matrix on rescaling techniques, the upscaling method was selected to be used in this research. No analysis was performed for downscaling, but the approach should work the same. The crop/non-crop maps were upscaled based on a widely-used aggregation method (i.e., MRB). The details of the MRB are described in Section 2.6. The similarity matrix was then obtained to illustrate the accuracy of the upscaled maps. To explore if existing, coarser resolution land cover maps are better than newly created upscaled maps derived from accurate and finer resolution land cover maps, existing land cover maps at two different resolutions (240 m and 960 m) were chosen. Note that we are aware that producing land cover maps at coarse resolutions are vitally important for various research projects. Thus, our work here only aims to develop a tool (i.e., the similarity matrix) for evaluating the upscaled maps. It is not our intent to minimize the contributions of the existing, coarser resolution land cover maps for the Earth science community. 2.6. Upscaling Method The MRB method for upscaling categorical maps has been recommended by [15] for agricultural projects. Therefore, our work utilized this method as an appropriate way to obtain the upscaled maps. To implement the MRB, a predefined square window corresponding to the coarser resolution pixel is constructed. Then, the MRB assigns the class type for the coarser resolution pixel based on selection of the most frequently occurring class in the base map within the predefined square window [17,18]. If there is more than one major class in the square window, the class for the coarser resolution map will be randomly determined from these major classes [15]. In this work, the base map was upscaled to 240 m and 960 m, respectively. 3. Results The CDL data were used to generate the crop/non-crop base maps which were then used to create the upscaled maps at 240 m and 960 m spatial resolution for each study site. The similarity matrix, proposed here was then employed to assess the accuracy of these upscaled maps. This assessment method can not only quantify how well the upscaling techniques represent the area of all the land cover types, but also the omitted and committed area information for each map class. The accuracy of the GFSAD maps was estimated using a similarity matrix for comparison with the upscaled maps. After assessing the upscaled maps and the GFSAD maps, there were four issues of concern left to be analyzed: (1) a comparison of the accuracy of the upscaled maps and the GFSAD maps; Remote Sens. 2018, 10, 487 9 of 21 (2) a comparison of the study site heterogeneity between the upscaled maps and the GFSAD maps; (3) the influence of heterogeneity on the accuracy of upscaled maps and GFSAD maps, and (4) the influence of extent on the accuracy of the upscaled maps. 3.1. Accuracy Assessment and Comparisons of Upscaled and GFSAD Maps There are a total of 30 base maps generated from CDL data for all study sites (12 regions + 9 states + 9 ASDs). For each study site, two upscaled maps (at 240 m and 960 m) were generated using these base maps. Figure 3 shows an example of these base map and the corresponding upscaled maps for the Heartland region study area. The results show that the upscaled maps have different spatial distributions of the land cover from the base map. In addition, the upscaled maps visually show that the difference in the spatial distribution of the land cover became larger as the upscaled map became coarser. Figure 3. The crop/non-crop base map generated from Cropland Data Layer (CDL) data and the related upscaled maps for the Heartland region study area. The similarity matrix for each upscaled map and each GFSAD map was obtained using the procedure developed in this project. An example of the similarity matrix for the upscaled map at 240 m for the Heartland region study site is shown in Table 4. The results show that the crop was omitted 9.79% and committed 12.30% of the time in the 240 m upscaled map. The OS for this example is 91.82%. Table 5 presents a summary of the OS for all of the upscaled maps (12 regions, 9 states, and 9 ASDs). Table 6 presents a summary of the OE and CE for the crop map class in all of the upscaled maps. At all spatial extents and study sites, the OS decreased and OE and CE increased when the upscaled maps became coarser. For example, the OS for the Heartland study site decreased from 91.82% to 86.88% and OE and CE increased from 9.79% to 17.37%, and from 12.3% to 18.40% when the basemap was upscaled from 240 m to 960 m. Table 7 presents the OS for all GFSAD maps for all of the study sites and Table 8 shows the OE and CE for the crop map class in all of the GFSAD maps. The OS of GFSAD maps at 960 m were, for many of the sites, significantly lower than the OS of the GFSAD maps at 240 m. For example, the OS for the Mountain region at 960 m decreased by 38.63 percentage points compared to the OS at 240 m. Additionally, the OE of the GFSAD maps at 960 m were lower than at 240 m, while CE at 960 m were higher when compared to the 240 m map. – Remote Sens. 2018, 10, 487 10 of 21 Table 4. An example of similarity matrix for the upscaled map at 240 m of the Heartland region. The unit of area is km2 . Upscaled Map Base map Non-Crop Crop Total OE (%) Non-crop Crop 192,370.09 11,638.43 15,044.26 107,291.75 207,414.346 118,930.176 7.82 9.79 Total 204,008.52 122,336.01 CE (%) 5.70 12.30 OS = 91.82% Table 5. The Overall Similarity (OS) of the upscaled maps for all study sites. Regional Level 240 m Study Site States Level 960 m Study Site OS (%) Heartland Mountain Northern Plains Northwest Pacific Upper Midwest Delta Eastern Mountain Great Lakes Northeastern Southern Southern Plains 91.82 97.78 89.53 97.89 98.67 90.00 95.60 91.99 90.21 92.16 94.62 96.09 ASD Level 240 m 960 m Study Site 240 m OS (%) 86.88 96.44 82.53 96.49 98.12 84.94 93.35 89.15 84.87 88.70 92.55 93.32 Arizona California Kentucky Maine Maryland Michigan Minnesota Nevada North Carolina 98.98 98.05 91.29 97.34 87.03 91.47 92.57 99.57 91.01 960 m OS (%) 98.63 97.28 88.42 96.40 79.97 87.06 88.47 99.32 86.91 0640 0651 0680 2420 2490 2670 2680 2720 3710 97.87 94.67 99.26 83.87 89.55 85.49 83.32 99.09 92.70 97.12 92.55 98.96 76.39 80.41 77.67 73.85 98.73 91.47 Table 6. The Omission Error (OE) and Commission Error (CE) for the crop map class in the upscaled maps for all study sites. Resolution Region OE (%) CE (%) State OE (%) CE (%) ASD OE (%) CE (%) 240 m 960 m Heartland 9.79 17.37 12.30 18.40 Arizona 21.99 30.26 14.58 19.77 0640 36.54 54.25 23.11 31.50 240 m 960 m Mountain 18.92 32.36 13.86 21.97 California 10.65 14.64 11.92 16.56 0651 7.17 9.17 11.56 16.36 240 m 960 m Northern Plains 12.21 21.21 14.45 23.28 Kentucky 37.95 53.78 20.68 28.15 0680 16.67 24.19 12.45 17.47 240 m 960 m Northwest 12.40 21.30 12.56 20.34 Maine 50.40 82.37 25.84 34.79 2420 26.68 39.41 22.51 33.54 240 m 960 m Pacific 11.94 17.16 12.16 16.95 Maryland 25.48 41.06 19.78 31.05 2490 20.05 40.60 16.76 30.90 240 m 960 m Upper Midwest 11.42 17.82 13.63 19.92 Michigan 18.81 29.98 17.96 26.62 2670 20.48 33.31 19.41 29.31 240 m 960 m Delta 13.19 19.43 11.73 18.13 Minnesota 9.81 15.38 12.25 18.67 2680 14.20 20.27 17.04 27.09 240 m 960 m Eastern Mountain 41.36 63.10 22.69 30.44 Nevada 26.56 45.31 15.18 23.68 2720 44.19 69.87 22.03 32.41 240 m 960 m Great Lakes 13.94 21.83 15.31 23.26 North Carolina 38.71 65.80 22.89 33.12 3710 68.42 92.42 32.68 42.61 240 m 960 m Northeastern 33.63 56.62 23.55 32.85 240 m 960 m Southern 40.21 62.97 22.41 30.96 240 m 960 m Southern Plains 16.92 30.00 12.99 22.28 Remote Sens. 2018, 10, 487 11 of 21 Table 7. The Overall Similarity (OS) of the Global Food Security-Support Analysis Data (GFSAD) maps for each study site. Regional Level 240 m Study Site State Level 960 m Study Site OS (%) Heartland Mountain Northern Plains Northwest Pacific Upper Midwest Delta Eastern Mountain Great Lakes Northeastern Southern Southern Plains 85.43 95.83 80.60 96.03 96.32 83.89 91.77 88.67 81.63 87.39 91.66 92.50 41.93 57.20 39.92 57.20 77.06 53.08 48.92 31.81 46.83 40.15 43.27 41.25 ASD Level 240 m 960 m Study Site 240 m OS (%) Arizona California Kentucky Maine Maryland Michigan Minnesota Nevada North Carolina 98.38 94.35 87.51 96.22 77.69 84.86 87.88 99.13 86.37 84.80 72.20 35.57 59.34 38.80 52.85 54.75 83.96 27.93 960 m OS (%) 0640 0651 0680 2420 2490 2670 2680 2720 3710 96.49 81.04 98.22 71.08 80.96 75.30 69.74 98.63 91.37 44.51 50.16 82.89 42.53 45.48 38.59 52.42 47.08 29.12 Table 8. The Omission Error (OE) and Commission Error (CE) for the crop map class in the GFSAD maps. Resolution Study Site OE (%) CE (%) Study Site OE (%) CE (%) Study Site OE (%) CE (%) 240 m 960 m Heartland 28.09 0.06 14.19 61.44 Arizona 43.34 15.09 18.45 85.74 0640 84.26 14.42 31.94 94.38 240 m 960 m Mountain 48.70 5.61 18.27 86.64 California 57.12 2.41 16.71 76.66 0651 60.25 0.49 17.44 64.41 240 m 960 m Northern Plains 32.53 0.03 20.65 60.81 Kentucky 70.60 0.58 19.12 80.08 0680 52.59 9.92 25.34 87.86 240 m 960 m Northwest 31.28 1.75 18.51 83.66 Maine 94.76 25.14 21.87 93.09 2420 78.06 1.80 27.08 63.32 240 m 960 m Pacific 58.68 5.19 16.60 81.26 Maryland 65.75 4.76 20.99 67.71 2490 55.29 8.09 19.60 66.39 240 m 960 m Upper Midwest 27.39 0.31 15.73 54.36 Michigan 51.23 2.96 21.89 67.13 2670 50.74 0.79 25.36 62.73 240 m 960 m Delta 37.56 1.71 12.14 74.17 Minnesota 22.38 0.15 15.45 57.73 2680 38.78 0.56 23.65 47.58 240 m 960 m Eastern Mountain 75.05 2.40 23.58 83.54 Nevada 76.41 36.62 14.44 95.78 2720 80.56 8.86 32.63 97.44 240 m 960 m Great Lakes 41.43 0.99 19.19 61.59 North Carolina 74.41 2.41 31.63 82.24 3710 97.74 3.29 36.34 89.35 240 m 960 m Northeastern 81.07 4.85 23.40 81.02 240 m 960 m Southern 82.96 14.38 26.56 87.30 240 m 960 m Southern Plains 43.18 3.11 18.66 81.84 To achieve one of the primary objectives of this paper, the accuracies of the upscaled maps and the GFSAD maps were compared. Tables 5–8 show that the upscaled maps obtained higher accuracy (higher OS, lower OE and CE) when compared to the GFSAD maps at each of the three different extent levels. For example, the OS of the upscaled map at 240 m for the Heartland region study site was higher by 6.39 percentage points compared to the coarse resolution GFSAD map at 240 m (Tables 5 and 7). Also, the OE and CE for the crop class of the upscaled map at 240 m for the Heartland region study site were lower by 18.3 percentage points and 1.89 percentage points, respectively, compared to the GFSAD map at 240 m. Remote Sens. 2018, 10, 487 12 of 21 3.2. Comparisons of Heterogeneity between Upscaled Maps and GFSAD Maps Figure 4 presents the PPU of the crop/non-crop map at 30 m (the reference data), the upscaled maps and the GFSAD maps for comparison. The results show that the PPU of the upscaled maps are closer to the PPU of the reference data compared to the PPU of the GFSAD maps. For example, for the Northern Plains region study site, the PPU of the reference data, the upscaled map at 240 m, and the GFSAD map at 240 m, were 60.15, 21.05, and 6.85, respectively. Additionally, the PPU of the maps at 240 m (either upscaled maps or GFSAD maps) are closer to the PPU of the reference data compared to the PPU of the coarser resolution maps at 960 m. For example, for the Heartland region study site, the PPU of reference map, the upscaled maps at 240 m and 960 m were 54.66, 17.53, and 1.31, respectively. Figure 4. Comparisons of Patch-Per-Unit area (PPU) of the upscaled maps and the Global Food – Security-Support Analysis Data (GFSAD) maps. (a–c) are the results of region level study site, state level study site, and ASD level study site, respectively. Note that CDL-30 m represents the crop/non-crop map derived from CDL data at 2010. 3.3. Influence of Heterogeneity on Upscaled Maps and GFSAD Maps Figure 5 shows the relationship between the OS of either the upscaled maps or the GFSAD maps and the heterogeneity of each grouping of study sites by extent (i.e., regional, state, and ASDs). The study sites at each spatial extent were grouped into one of several heterogeneity levels (L1 to Lh) corresponding to L1, L2, …, based on their PPU. For example, there are a total of 12 region level study sites, thus, the heterogeneity was defined as 12 levels (Figure 5a,b), which increased corresponding to L1, L2, . . . , L12. The results Remote Sens. 2018, 10, 487 13 of 21 show that higher heterogeneity results in lower accuracy of the upscaled maps and the GFSAD maps at each resolution. For example, when the PPU increased from 9.28 to 72.22 (heterogeneity level L1 to L12 in Figure 5), the OSs for the region-level study sites were reduced from 98.67% to 90.00%, and from 96.32% to 83.89%, respectively, for the upscaled map at 240 m and the GFSAD map at 240 m (Figure 5a). Additionally, either OE, CE, or both increased for the upscaled and the GFSAD maps when the study site became more heterogeneous (Tables 6 and 8). For example, Maryland with PPU of 132.77 was more heterogeneous than the Nevada with PPU of 4.72. For the Maryland, the CE for crop of the GFSAD map at 240 m was higher by 6.55 percentage points, comparing to the Nevada (Table 8). Figure 5. The relationship between heterogeneity and the Overall Similarity (OS). The graphs at (a,c,e) show the OS of the upscaled maps at 240 m and the GFSAD maps at 240 m, respectively, for region level, state level and ASD level. The graphs at (b,d,f) show the OS of the upscaled maps at 960 m and the Global Food Security-Support Analysis Data (GFSAD) maps at 960 m, respectively, for region level, state level and ASD level. Note Li in the X-axis represents the heterogeneity level of study sites corresponding to each extent level. Remote Sens. 2018, 10, 487 14 of 21 3.4. Influence of Extent on the Accuracy of Upscaled Maps The results show that the extent of the study sites did not impact the accuracy of upscaled maps. First, for some study sites, there is no influence on the accuracy of the upscaled maps when the extent of the study site changes. For example, the state of Arizona and the ASD2720 had similar landscape pattern (landscape heterogeneity, and mean size of the fields (Table 1)). Note that the Arizona covered about 295,386.1 km2 that is about eight times larger than ASD2720 which is located in Minnesota (Table 1). The OS of upscaled maps at 960 m for these two study sites were 98.63%, and 98.73%, respectively (Table 5). Second, for some other study sites, the accuracy of the upscaled maps for the regions (i.e., the very large areas) varied such that some were lower and others higher than the small areas. For example, the Northwest region covered about 642,992.8 km2 which is about 17 times larger than the ASD2720 (Table 1). The OS of upscaled map at 240 m for the Northwest region was lower by about 1.2 percentage points compared to the ASD2720 (Table 5). 4. Discussion 4.1. Significance of the Similarity Matrix Rescaling techniques inevitably lead to errors in land cover area (e.g., [11,14]). These errors in area are not only derived from the omitted areas from the base map but also by the committed areas created in the upscaled map. If the OS is 100% (i.e., the base map and the upscaled map are exactly the same (Figure 6a,d), it is not necessary to compute the OE and CE. However, an OS of 100% rarely occurs (e.g., [11]). Even if the OS of 100% occurs at a particular spatial resolution, it is difficult to know if the OS will also be 100% at another spatial resolution. Figure 6 shows an example where the base map (Figure 6a) is upscaled by the MRB to two different spatial resolutions (Figure 6d,e). Figure 6b,c are predefined windows for producing Figure 6d,e, respectively. The gray color represents class 1, while the white color represents class 2. In some situations, the area information of one class in the upscaled map will be same as its area information in the base map. For example, the base map pixels, P1, P2, P3, and P4, were aggregated to upscaled pixel Pc (Figure 6d). Since class 2 occurred more frequently within the predefined window, W1, the class type of Pc is class 2 according to the principle of MRB. However, the OS cannot be always 100% due to the CE and OE. For example, In Figure 6d, the OS is 100%, while in Figure 6e the OS is 81.25%. Up to now, there has been no standard and simple way to evaluate the accuracy of the rescaled map and to report the uncertainty information of the rescaled map to the users of the land cover map. The similarity matrix proposed here provides such a way. Figure 6. An example of upscaled maps derived by Majority Rule Based aggregation (MRB). Two classes are in it. Class 1 is with the gray color. Class 2 is with the white color. (a) is the base map; (b,c) are predefined windows for implementing upscaling; (d,e) are two upscaled maps at two different resolutions. Remote Sens. 2018, 10, 487 15 of 21 4.2. Issues for Generating Upscaled Maps An important upscaling issue that should be noted here is the problem that can occur around the edges of the study area. If the dimensions of the base map and the rescaled map are not equally divisible, then some partial upscaled pixels will fall outside of the edge of the base map. As Figure 7 demonstrates, two non-overlapping windows (W3 and W6) fall outside the edge of the base map. The upscaled pixels, P3 and P6, are determined based on only the pixels in the base map that fall within the W3 and W6. Thus, for W3 and W6, there are only eight pixels used to determine the class type in the upscaled map. Figure 7. Boundary issues derived from the implementation of the Majority Rule Based (MRB). Interactive Data Language (IDL 8.3) was used to conduct MRB rescaling in this project. For this language, the implementation of the MRB rescaling algorithm begins from the upper-left pixel as the first pixel and ends with the lower-right pixel as the last pixel. Hence, the potential error of the determination of the boundary pixel in an upscaled map only occurs at the right edge and/or the bottom edge of the upscaled map. However, the similarity matrix is not impacted because only existing class types are included in the matrix calculation. In addition, the similarity matrix presents the uncertainty information by considering the error derived from the boundary pixels in the upscaled – maps. These conditions further strengthen our confidence to recommend the similarity matrix to be extensively used to evaluate the rescaled maps. 4.3. Accuracy Analysis between Upscaled Maps and GFSAD Maps Since one of our objectives was to compare existing coarser resolution maps with the upscaled maps, the GFSAD maps also were assessed using the similarity matrix. One issue that should be noted is that the crop/non-crop maps derived from the CDL data were assumed as the reference data for assessing the GFSAD maps [29]. We are aware that the CDL data are not 100% accurate. However, this dataset is highly accurate and was readily available at finer resolution [7] and therefore, was very appropriate for use in this project. The comparisons of accuracy between the upscaled maps and GFSAD maps show that the upscaled maps can obtain higher accuracies (Tables 5–8). These results demonstrate that employing upscaled maps will be a better way to further conduct the related earth modeling research compared to using the GFSAD maps. Note that we are not saying here that all the existing coarser resolution maps should be replaced by the upscaled maps, but rather that the similarity matrix should be used to test which is better. The lower accuracy of the GFSAD map in the USA mainly results from fact that coarser spatial resolution imagery were used for producing the GFSAD maps, while the finer spatial resolution imagery were used for producing the CDL data. The finer resolution imagery used to generate the CDL Remote Sens. 2018, 10, 487 16 of 21 data include Lansat-5 Thematic Mapper (TM), Landsat-7 Enhanced TM plus (ETM+), Landsat-8 Optical Land Imager (OLI), Resourcesat-1 Advanced Wide Field Sensor (AWiFS), Disaster Monitoring Constellation (DMC) DEIMOS-1, and UK2 sensors. This finer-resolution imagery was used to obtain base maps that can accurately represent the distribution of land cover. However, the GFSAD maps are mainly based on the MODerate resolution Imaging Spectroradiometer (MODIS) imagery at 250 m [29]. As is known, greater amounts of mixed pixels exist in the coarser resolution imagery, which can introduce more errors into the classified maps (i.e., GFSAD map) [30]. Therefore, compared to the upscaled maps derived from accurate, finer resolution CDL data, GFSAD likely will not obtain higher accuracy of the distribution of land cover. To visually show the difference between the upscaled map and the GFSAD map, two sub-areas in the region of Upper-Midwest were selected (Figure 8). The upscaled map either at 240 m or at 960 m accurately presents the crop distribution. The GFSAD map at 960 m of the region was mostly classified as crop, while the upscaled map present the original crop distribution as accurate as possible. These results demonstrate that the upscaled map can more accurately represent the distribution of land cover, which further strengthens our confidence in recommending the use of upscaled maps as inputs for earth science models. However, assessing and comparing the performance of the upscaled maps and existing coarser resolution maps are required before making these decisions. Figure 8. An example of difference between upscaled maps and the Global Food Security-Support Analysis Data (GFSAD) maps for 240 m and 960 m, respectively, for the region of Upper Midwest. The left panel presents the GFSAD maps at 240 m and 960 m, respectively. The right panel presents the upscaled maps at 240 m and 960 m, respectively. Remote Sens. 2018, 10, 487 17 of 21 4.4. Differences in Heterogeneity between the Upscaled Maps and the GFSAD Maps A vitally important issue underlying different land cover maps is the difference in representations of landscape pattern. As acknowledged, several studies have reported that the representation of landscape pattern is biased in the upscaled map (e.g., [11,15]). However, comparisons between the representation of landscape pattern based on the upscaled map and the existing coarser resolution map has not been explored. Therefore, the heterogeneity, evaluated by PPU, was selected to be analyzed in this research. Note that the study sites with higher PPU are more heterogeneous than study sites with lower PPU. The comparisons of heterogeneity represented by the PPU values (Figure 4) and by visual observation (Figure 8) between the upscaled maps and the GFSAD maps demonstrated that upscaled maps can obtain a better representation of the landscape patterns when compared to the GFSAD maps. This is because the upscaled maps can more accurately represent the distribution of land cover compared to the GFSAD maps. Furthermore, for users, a good quality land cover map should not only have a better map accuracy, but also a better representation of landscape pattern. Therefore, considering the comparisons of the PPU between two datasets and the users’ requirements, the upscaled map is more suitable and should be used in Earth science models. 4.5. Considerations with Respect to Heterogeneity The heterogeneity of the study site has been emphasized as a major factor impacting the accuracy of the upscaled maps (e.g., [11,31]) and the accuracy of classification maps (e.g., [32]), in general. Thus, the influence of heterogeneity on upscaled maps and GFSAD maps should be explored. As expected, the accuracies of both the upscaled maps and the GFSAD maps were considerably impacted by the heterogeneity of the study sites. Lower accuracies for the upscaled and the GFSAD maps, at each resolution, were obtained in the more heterogeneous areas compared to the homogeneous areas (Figure 5, Tables 5–8). This is because the spectral heterogeneity and spectral complexity of the imagery are higher in heterogeneous areas, which results in lower classification map accuracy (e.g., CDL data, GFSAD map) [32]. If these heterogeneous maps (e.g., CDL data) were used as base maps for upscaling, mapping error would increase in the upscaled map and the accuracy of the upscaled map would be reduced [11]. Therefore, given the lower upscaling accuracy in the heterogeneous areas, users should be cautious when utilizing upscaled maps in heterogeneous landscapes. Recognizing the negative effect of heterogeneity on the upscaled maps requires not only that caution be taken when using the upscaling techniques in the heterogenous areas, but also demonstrates the need for developing improved upscaling techniques. The improved upscaling techniques should consider the heterogeneity as a factor to improve the upscaling accuracy. Additionally, the analyst that is producing the land cover map should be aware of lower accuracy occurring in the heterogeneous areas. More accurate and improved classification techniques are also recommended to be developed and employed for improving the accuracy of the land cover products, especially in the heterogeneous areas. These accurate land cover products can be further used for producing upscaled maps at different resolution meeting the requirements for various Earth observation models or their related research (e.g., [33]). 4.6. Considerations with Respect to Extent of Maps Landscape pattern has strong influence on the accuracy of the upscaled maps, as demonstrated by previous research (e.g., [31,34]). However, previous research rarely explored if the extent (i.e., size of the area) of the study sites impacted the upscaled maps. Therefore, as part of our research we employed different size study sites from large regions of the US to individual states, to smaller ASDs within a state. Although we did not test different upscaling methods to see if extent has an influence on Remote Sens. 2018, 10, 487 18 of 21 the upscaling accuracy, it is worthy of discussion based on the results derived from using just the MRB approach. The results of this study demonstrated that there was no difference between upscaled maps at the same spatial resolution for different extent levels (i.e., region level, state level, and ASD level). As long as the heterogeneity is similar, the upscaling accuracy did not show much difference between study area sizes. Instead, it is evident that heterogeneity and not extent is the main factor impacting the upscaling accuracy. 4.7. Implications, Limitations, and Future Work The results of this work have important implications for decision making. The similarity matrix has been demonstrated as an efficient and effective method to assess the rescaled map. This method can provide detailed information on the accuracy of the rescaled map, which is important for users who wish to use these maps for further applications. In addition, comparisons between the upscaled map and any existing coarser resolution map draw demonstrate that it may be better to use the upscaled map than an existing coarser resolution map. The higher performance of the upscaled maps, in terms of accuracy and landscape pattern, confirms that they can be reliably used for Earth science modelling. Note that, here, we did do not emphasize the upscaled maps can always be better than the existing maps. The results illustrate that the upscaled maps can be one option for map users. Therefore, we strongly recommend that a comparison using the similarity matrix should be conducted to assess any existing coarse resolution maps with potential upscaled maps for use in any modelling. Further, the role of heterogeneity in producing upscaled maps confirms that landscape pattern is an important consideration for those researchers developing upscaling techniques to obtain more accurate upscaled maps. Moreover, this study provides evidence that it is not necessary to consider the extent when developing a new upscaling technique. Overall, the achievements of this paper are essential for guiding land cover map users to select suitable datasets for their models and encourage them to consider heterogeneity when developing any new upscaling techniques. Finally, we are aware that our research may have three limitations. The first major limitation is that there was no GFSAD map at 30 m resolution for 2010. Therefore, we used the CDL data at 30 m to generate the crop/non-crop map as the reference data to compare the results between GFSAD maps and the upscaled maps. These comparisons were based on the assumption that the CDL data are 100% correct, which we acknowledge is not the case. This limitation resulted in errors in the final results, which would be extremely difficult to quantitatively assess without an extremely accurate reference data. In order to do this assessment, reference data covering the entire USA would be needed. The cost for producing such a dataset in prohibitive and justifies our use of a very accurate (not 100%) data set (e.g., CDL). The second limitation is that the results of this paper are based on the simple case of a crop/non-crop map and not a more complex land cover map with many land cover classes. Thus, our analysis should be repeated in future work using upscaled maps and the existing land cover maps that are more complex. Additionally, only the upscaling of maps has been explored in this paper. While these techniques are directly applicable to downscaled maps, the paper does not demonstrate this use and should be the subject of future work. The third limitation is that only a single upscaling technique (i.e., MRB) was used here. As acknowledged, different upscaling techniques perform differently (e.g., [15]). The goal of this paper was to demonstrate the power of the similarity matrix and not to evaluate upscaling algorithms. Therefore, it would be useful to use the similarity matrix approach to compare different upscaling techniques in some future work. 5. Conclusions This paper first proposed a new method, the similarity matrix, for assessing the accuracy of rescaled maps using three measurements: overall similarity (OS), omission error (OE), and commission error Remote Sens. 2018, 10, 487 19 of 21 (CE). A conventional upscaling technique, Majority Rule Based aggregation (MRB), was used to produce the upscaled maps to demonstrate the performance of the similarity matrix. Additionally, this paper explored if these upscaled maps perform better than existing coarse resolution crop/non-crop maps. To conduct these experiments, two datasets: the Cropland Data Layer (CDL) at 30 m for 2010 and the GFSAD nominal data for 2010 at 250 m and 1 km, were utilized. The CDL data were first used to extract the crop/non-crop maps for (1) providing the base maps for upscaling, and (2) providing reference data to assess the accuracy of the upscaled maps and the GFSAD maps. Further, to determine if the extent of the study site has influence on the performance of the upscaling, three different extent levels were employed. There were twelve regions covering the continental USA, nine states, and nine Agricultural Statistic Districts (ASDs) that were selected to perform the experiments. Several conclusions resulted from this analysis: (1) the similarity matrix can successfully assess the accuracy of the rescaled map and report the uncertainty information for the rescaled map. This information is beneficial for land cover map users for further assessing the uncertainty or accuracy of their models. Also, the similarity matrix can provide information for making decisions about selecting the datasets for further analysis such as use in Earth science modeling; (2) the upscaled maps outperformed the existing coarser resolution GFSAD maps in both accuracy and representation of the landscape pattern. This result demonstrates that evaluating the performance of the upscaled map and the existing coarser resolution map is required before making a decision for selecting the datasets as the input to the users’ model; (3) the influence of heterogeneity on the upscaling accuracy does not only reveal that lower accuracy was produced in the heterogeneous area, but also recommends that a new upscaling technique should consider heterogeneity as an important factor to improve the accuracy of the upscaling technique, and (4) the results between different study sites with different extents quantitatively confirms that the performance of upscaling has not been impacted by the extent of study site. In conclusion, extending our understanding on how to assess the rescaled map, determination for selecting the upscaled map or the existing land cover map, and suggestions for developing new upscaling techniques, is vitally helpful to the Earth science community as users of land cover maps. Future work should focus on four aspects: (1) different downscaling/upscaling techniques should be employed to investigate if the downscaled/upscaled maps can really perform better than the existing land cover maps; (2) comparisons between upscaled maps and downscaled maps should be demonstrated; (3) landscape pattern should be considered for developing new downscaling/upscaling techniques, and (4) the land cover map with more map classes should be employed to explore if the upscaled map and downscaled map still can be a prospective data source for Earth science modelling research. Acknowledgments: The authors would like to thank the anonymous reviewers for their helpful suggestions and insightful comments that greatly improved our manuscript. Partial funding was provided by the New Hampshire Agricultural Experiment Station. This is Scientific Contribution Number 1299. This work was supported by the USDA National Institute of Food and Agriculture McIntire Stennis Project #NH00077-M (Accession #1002519). We thank the National Agricultural Statistics Service (NASS) Cropland Data for providing CDL data. The authors are also thankful to the U.S. Department of Agriculture’s National Agricultural Statistics Service; Food and the Agriculture Organization of the United Nations for providing the Global Food Security Support Analysis Data (GFSAD) data. Additionally, we also thank the China Scholarship Council (CSC) for providing foundation for the first author as a visiting researcher in the Basic and Applied Spatial Analysis Lab (BASAL), Department of Natural Resources and the Environment, University of New Hampshire, under the direction of Russell G. Congalton. We also thank Heather Grybas who gave additional suggestions for improving the quality of this paper. Author Contributions: The concept of this research was initially developed by Peijun Sun in discussion with Russell G. Congalton. Peijun Sun, and Russell G. Congalton then designed the experiment. Peijun Sun performed the experiment. Peijun Sun wrote the initial draft of the paper, which was edited significantly by Russell G. Congalton until the final paper was produced. Then, Peijun Sun converted the paper to the final format for this journal. Conflicts of Interest: The authors declare no conflict of interest. Remote Sens. 2018, 10, 487 20 of 21 References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. Congalton, R.G.; Gu, J.; Yadav, K.; Thenkabail, P.; Ozdogan, M. Global land cover mapping: A review and uncertainty analysis. Remote Sens. 2014, 6, 12070–12093. [CrossRef] Wang, Z.; Hoffmann, T.; Six, J.; Kaplan, J.O.; Govers, G.; Doetterl, S.; Oost, K.V. Human-induced erosion has offset one-third of carbon emissions from land cover change. Nat. Clim. Change 2017, 7, 345. [CrossRef] Pachauri, R.K.; Allen, M.R.; Barros, V.R.; Broome, J.; Cramer, W.; Christ, R.; Church, J.A.; Clarke, L.; Dahe, Q.; Dasgupta, P.; et al. Climate Change 2014: Synthesis Report. Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change; Pachauri, R.K., Meyer, L., Eds.; IPCC: Geneva, Switzerland, 2014. Sellers, P.J.; Dickinson, R.E.; Randall, D.A.; Betts, A.K.; Hall, F.G.; Berry, J.A.; Collatz, G.J.; Denning, A.S.; Mooney, H.A.; Nobre, C.A.; et al. Modeling the exchanges of energy, water, and carbon between continents and the atmosphere. Science 1997, 275, 502–509. [CrossRef] [PubMed] Chang, J.; Ciais, P.; Viovy, N.; Vuichard, N.; Herrero, M.; Havlík, P.; Wang, X.; Sultan, B.; Soussana, J.-F. Effect of climate change, CO2 trends, nitrogen addition, and land-cover and management intensity changes on the carbon balance of European grasslands. Glob. Change Biol. 2016, 22, 338–350. [CrossRef] [PubMed] Grekousis, G.; Mountrakis, G.; Kavouras, M. An overview of 21 global and 43 regional land-cover mapping products. Int. J. Remote Sens. 2015, 36, 5309–5335. [CrossRef] Boryan, C.G.; Yang, Z.; Willis, P.; Di, L. Developing crop specific area frame stratifications based on geospatial crop frequency and cultivation data layers. J. Integr. Agric. 2017, 16, 312–323. [CrossRef] Pielke, R.A. Land use and climate change. Science 2005, 310, 1625–1626. [CrossRef] [PubMed] Buyantuyev, A.; Wu, J. Effects of thematic resolution on landscape pattern analysis. Landsc. Ecol. 2007, 22, 7–13. [CrossRef] Grafius, D.R.; Corstanje, R.; Warren, P.H.; Evans, K.L.; Hancock, S.; Harris, J.A. The impact of land use/land cover scale on modelling urban ecosystem services. Landsc. Ecol. 2016, 31, 1509–1522. [CrossRef] Sun, P.; Congalton, R.G.; Grybas, H.; Pan, Y. The impact of mapping error on the performance of upscaling agricultural maps. Remote Sens. 2017, 9, 901. [CrossRef] Tang, Y.; Atkinson, P.M.; Zhang, J. Downscaling remotely sensed imagery using area-to-point cokriging and multiple-point geostatistical simulation. ISPRS J. Photogramm. Remote Sens. 2015, 101, 174–185. [CrossRef] Atkinson, P.M. Downscaling in remote sensing. Int. J. Appl. Earth Obs. Geoinformation 2013, 22, 106–114. [CrossRef] Gardner, R.H.; Lookingbill, T.R.; Townsend, P.A.; Ferrari, J. A new approach for rescaling land cover data. Landsc. Ecol. 2008, 23, 513–526. [CrossRef] Raj, R.; Hamm, N.A.S.; Kant, Y. Analysing the effect of different aggregation approaches on remotely sensed data. Int. J. Remote Sens. 2013, 34, 4900–4916. [CrossRef] Bian, L.; Butler, R. Comparing effects of aggregation methods on statistical and spatial properties of simulated spatial data. Photogramm. Eng. Remote Sens. 1999, 65, 73–84. Saura, S. Effects of remote sensor spatial resolution and data aggregation on selected fragmentation indices. Landsc. Ecol. 2004, 19, 197–209. [CrossRef] He, H.S.; Ventura, S.J.; Mladenoff, D.J. Effects of spatial aggregation approaches on classified satellite imagery. Int. J. Geogr. Inf. Sci. 2002, 16, 93–109. [CrossRef] Kyriakidis, P.C. A geostatistical framework for area-to-point spatial interpolation. Geogr. Anal. 2004, 36, 259–289. [CrossRef] Boucher, A.; Kyriakidis, P.C.; Cronkite-Ratcliff, C. Geostatistical solutions for super-resolution land cover mapping. IEEE Trans. Geosci. Remote Sens. 2008, 46, 272–283. [CrossRef] Ling, F.; Du, Y.; Li, X.; Zhang, Y.; Xiao, F.; Fang, S.; Li, W. Superresolution land cover mapping with multiscale information by fusing local smoothness prior and downscaled coarse fractions. IEEE Trans. Geosci. Remote Sens. 2014, 52, 5677–5692. [CrossRef] Frazier, A.E. A new data aggregation technique to improve landscape metric downscaling. Landsc. Ecol. 2014, 29, 1261–1276. [CrossRef] Boryan, C.; Yang, Z.; Mueller, R.; Craig, M. Monitoring US agriculture: the US Department of Agriculture, National Agricultural Statistics Service, Cropland Data Layer Program. Geocarto Int. 2011, 26, 341–358. [CrossRef] Remote Sens. 2018, 10, 487 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 21 of 21 USDA. National Agricultural Statistics Service Frequently Asked Questions Related to Quick Stats County Data. Available online: https://www.nass.usda.gov/Data_and_Statistics/County_Data_Files/Frequently_ Asked_Questions.htm (accessed on 17 May 2017). Wright, C.K.; Wimberly, M.C. Recent land use change in the Western Corn Belt threatens grasslands and wetlands. Proc. Natl. Acad. Sci. 2013, 110, 4134–4139. [CrossRef] [PubMed] Teluguntla, P.; Thenkabail, P.S.; Xiong, J.; Gumma, M.K.; Congalton, R.G.; Oliphant, A.; Poehnelt, J.; Yadav, K.; Rao, M.; Massey, R. Spectral matching techniques (SMTs) and automated cropland classification algorithms (ACCAs) for mapping croplands of Australia using MODIS 250-m time-series (2000–2015) data. Int. J. Digit. Earth 2017, 10, 944–977. [CrossRef] Teluguntla, P.; Thenkabail, P.S.; Xiong, J.; Gumma, M.K.; Giri, C.; Milesi, C.; Ozdogan, M.; Congalton, R.G.; Tilton, J. Global Food Security Support Analysis Data (GFSAD) at Nominal 1 km (GCAD) derived from remote sensing in support of food security in the twenty-first century: Current achievements and future possibilities. In Land Resources Monitoring, Modeling, and Mapping with Remote Sensing; Thenkabail, P.S., Ed.; CRC Press: Boca Raton, FL, USA, 2015; pp. 131–160. Frohn, R.C. Remote Sensing for Landscape Ecology: New Metric Indicators for Monitoring, Modeling, and Assessment of Ecosystems; CRC Press: Boca Raton, FL, USA, 1997. Massey, R.; Sankey, T.T.; Congalton, R.G.; Yadav, K.; Thenkabail, P.S.; Ozdogan, M.; Sánchez Meador, A.J. MODIS phenology-derived, multi-year distribution of conterminous U.S. crop types. Remote Sens. Environ. 2017, 198, 490–503. [CrossRef] Costa, H.; Foody, G.M.; Boyd, D.S. Using mixed objects in the training of object-based image classifications. Remote Sens. Environ. 2017, 190, 188–197. [CrossRef] Sun, P.; Congalton, R.G.; Pan, Y. Improving the upscaling of land cover maps by fusing uncertainty and spatial structure information. Photogramm. Eng. Remote Sens. 2018, 84, 87–100. [CrossRef] Oguro, Y.; Suga, Y.; Takeuchi, S.; Ogawa, M.; Konishi, T.; Tsuchiya, K. Comparison of SAR and optical sensor data for monitoring of rice plant around Hiroshima. Adv. Space Res. 2001, 28, 195–200. [CrossRef] Lechner, A.M.; Rhodes, J.R. Recent progress on spatial and thematic resolution in landscape ecology. Curr. Landsc. Ecol. Rep. 2016, 1, 98–105. [CrossRef] Moody, A.; Woodcock, C.E. The influence of scale and the spatial characteristics of landscapes on land-cover mapping using remote sensing. Landsc. Ecol. 1995, 10, 363–379. [CrossRef] © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Keep reading this paper — and 50 million others — with a free Academia account
Used by leading Academics
Praveen Saptarshi
Savitribai Phule Pune University
Prof. Raimonds Ernsteins
University of Latvia
Sibel Pamukcu
Lehigh University
Roger Saint-Fort
Mount Royal University