Abstract
Knowing the historical yield patterns of major commodity crops, including the trends and interannual variability, is crucial for understanding the current status, potential and risks in food production in the face of the growing demand for food and climate change. We updated the global dataset of historical yields for major crops (GDHY), which is a hybrid of agricultural census statistics and satellite remote sensing, to cover the 36-year period from 1981 to 2016, with a spatial resolution of 0.5°. Four major crops were considered: maize, rice, wheat and soybean. The updated version 1.3 was developed and then aligned with the earlier version 1.2 to ensure the continuity of the yield time series. Comparisons with different global yield datasets and published results demonstrate that the GDHY-aligned version v1.2â+âv1.3 dataset is a valuable source of information on global yields. The aligned version dataset enables users to employ an increased number of yield samples for their analyses, which ultimately increases the confidence in their findings.
Measurement(s) | yield trait |
Technology Type(s) | satellite imaging ⢠digital curation ⢠computational modeling technique |
Factor Type(s) | year of crop yield data collection |
Sample Characteristic - Environment | area of cropland |
Sample Characteristic - Location | Earth (planet) |
Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.11903277
Similar content being viewed by others
Background & Summary
Crop yield (production per unit harvested area) is an essential variable in many disciplines. Global yield datasets for the historical past have increasingly been used to analyze climate-crop relationships, food production potential, food supply and demand, carbon and nitrogen cycling, greenhouse gas emissions from agriculture and land-use change. Recently, food production losses caused by weather and climate extremes under changing climate and improved stakeholder preparedness are concerns for many societies as the world experiences population growth and subsequent increases in the demand for agricultural products.
An analysis of climate-crop relationships, in particular, the consequences of weather and climate extremes on food production, requires a spatially explicit yield dataset spanning several decades. At the global scale, such a dataset has only recently been developed. The global dataset of historical yield for major crops (GDHY)1 is an example of such a dataset. The GDHY is a hybrid of agricultural census statistics and satellite remote sensing. Crop harvested area maps, crop calendar and share of production amount in different growing seasons for a crop are also used as inputs for the GDHY dataset. Therefore, the grid-cell yield values recorded in the GDHY dataset are model estimates rather than observations. Since its development and initial release in November 2013, efforts have focused on improving the data quality, assessing uncertainties and extending the time coverage to include more recent years.
The previous version 1.2 of the GDHY2,3 covers the period of 1981â2011. Here, we updated the GDHY dataset to include 2016 (that is, version 1.3) to meet the increasing demand for yield data from the scientific community, food agencies and agrobusinesses. However, the satellite products and reanalysis data used as the inputs for the development of version 1.3 are different from those used in earlier versions, as elaborated later in this article. This difference requires an alignment of version 1.3 with version 1.2 to ensure the continuity of the annual yield time series in the GDHY. Such alignment is essential for many applications in which time series analysis is often utilized, e.g., to depict historical yield patterns and linkages to climate variability and change.
The aligned version v1.2â+âv1.3 of the GDHY described in this article4 offers yield data for maize, rice, wheat and soybean for the period of 1981â2016, with a spatial resolution of 0.5° and an explicit separation of cropping seasons for some crops (major and second cropping seasons for maize and rice and winter and spring seasons for wheat). The GDHY offers spatially explicit global analyses on crop yields and is especially useful for addressing recent patterns in crop yields and the impacts of recent climate variability and change on global food production; additionally, the GDHY can be used to evaluate global gridded crop model simulations and provide a basis for global and seasonal crop forecasting systems.
Methods
Updating the GDHY
The method and procedure used to provide grid-cell yield estimates for version 1.3 of the GDHY are fully described in our related work1. In short, the procedure consists of four key steps: (1) the countryâs annual yield statistics were obtained from the Food and Agriculture Organization of the United Nations statistical database (FAOSTAT5); (2) the grid-cell net primary production (NPP) was calculated using the remotely sensed leaf area index (LAI), the fraction of photosynthetically active radiation (FPAR), reanalysis solar radiation and reported crop-specific radiation-use efficiency to consider the spatial variations in yields within a country; (3) the harvested area map (M3-Crops6) and crop calendars circa 2000 (SAGE7) were used to address where and when a crop of interest was grown; and (4) when the crop calendars indicated that a crop of interest was harvested twice in a year, the share of production amount by different cropping season of a crop available in the US Department of Agriculture (USDA) report8 was used to differentiate the yield estimates for different cropping seasons. Production-weighted mean, instead of arithmetic mean, is utilized when average yield from two cropping seasons with different production share is computed.
Some inputs used in the development of the version 1.3 dataset were different from those used in the version 1.2 dataset (Table 1). The major differences were found in the satellite products and reanalysis data. The LAI and FPAR inputs were changed from the GIMMS3g [Global Inventory Modeling and Mapping Studies third generation products from the AVHRR (Advanced Very High Resolution Radiometer)] products9 for the version 1.2, to the more advanced MOD15A2 products10 derived from the MODIS (Moderate Resolution Imaging Spectroradiometer) for the version 1.3. The spatial and temporal resolutions of the MOD12A2 products (1-km and 8-day, respectively) were finer than those of the GIMMS3g products (0.083° or 10-km and bi-monthly or 15-day), although the crop harvested area map with a spatial resolution of 10-km was commonly used for both versions 1.2 and 1.3. The daily solar radiation data were also changed from the 1.125°-resolution JRA-25 reanalysis11 for the version 1.2 to the 0.563°-resolution JRA-55 reanalysis12,13 for the version 1.3.
The GIMMS3g NDVI (normalized difference vegetation index) used in estimating the GIMMS3g LAI and FPAR were calibrated against the MODIS LAI and FPAR products for the period of 2000â2009 (ref. 9). Thus, the continuity of the LAI and FPAR time series at 10-km and 15-day scales was expected. However, the quality-checking of the GDHY version 1.3 dataset revealed persistent discontinuities in annual yield time series between versions 1.2 and 1.3 for some locations, despite the use of the calibrated GIMMS3g LAI and FPAR products (Fig. 1). Yields from the version 1.3 were almost always higher than those from the version 1.2. Addressing the exact reasons for the discontinuities is beyond the scope of this article. However, the different reanalysis solar radiation products between the two versions are one possible reason. And the different spatial resolutions of the satellite products used in versions 1.2 and 1.3 is another possible reason. The version 1.2 dataset uses average NPP over the 10-km grid cell, while the version 1.3 dataset uses the maximum NPP over the 1-km cropland grid cells located within a 10-km grid cell. To solve this problem and supply users a version of the GDHY with continuity, the two versions were aligned, as elaborated in the subsequent section.
Alignment
The two different versions of the GDHY described above were aligned according to the following procedure. First, in the version 1.2 dataset, the annual yield time series for a given location, crop and cropping season was decomposed into the linear combination of the yield trend component and the yield departure from the trend component:
where \({Y}_{{\rm{v}}1.2,t}\) indicates the annual yield in harvesting year t (t haâ1); \({Y}_{{\rm{v}}1.2}\) indicates the yield trend component or normal yield (t haâ1); and \({\mathop{Y}\limits^{Y}}_{{\rm{v}}1.2}\) indicates the yield anomaly that represents the yield departure from normal yield (t haâ1). The normal yield was calculated by applying the 5-year (t-4 to t) moving average method to the annual time series:
The yield values in the version 1.3 dataset were also decomposed, as the version 1.2 dataset were processed (\({Y}_{{\rm{v}}1.3,t}={Y}_{{\rm{v}}1.3,t}+{\acute{Y}}_{{\rm{v}}1.3,t}\); and \({Y}_{{\rm{v}}1.3,t}=\frac{{\sum }_{t-4}^{t}{Y}_{{\rm{v}}1.3,t}}{5}\)).
Second, the two versions of the GDHY were combined into a single time series using the following rule:
For the period of 1981â1999, in which only the version 1.2 dataset is available, the yield values in the aligned version (\({Y}_{{\rm{v}}1.2+{\rm{v}}1.3}\)) are equal to those of version 1.2. For the period of 2000â2010, both versions are available. The normal yields were taken from version 1.2, and the average yield anomalies across the two versions were added to the normal yields. For the remaining period (2011â2016), only version 1.3 is available. The yield anomalies were taken from version 1.3. In contrast, the normal yields were computed by adding the changes in the normal yields between 2010 and the target years (2011â2016), as computed based on version 1.3, to the normal yield in 2010 of version 1.2. When the alignment led to a negative value, the yield value was replaced with zero. By using this procedure, the two versions were harmonized into a single aligned version referred to as the GDHY version v1.2â+âv1.3 dataset (Fig. 1).
Data Records
The GDHY aligned version v1.2â+âv1.3 dataset files include the annual crop yield time series in tonnes per hectare (t haâ1) for each grid cell. The files are in NetCDF4 format and were generated by using library version 4.6.1.0; they are available at XXXX/yield_YYYY.nc4, where XXXX indicates the crop and cropping seasons (i.e., maize_major, maize_second, rice_major, rice_second, wheat_winter, wheat_spring and soybean); and YYYY indicates the year (i.e., 1981, â¦, 2016). Only a single cropping season is considered for soybean. The dataset is freely available at PANGAEA4.
Technical Validation
Approaches for validation
We used two different methods to validate the GDHY aligned version v1.2â+âv1.3 dataset: (1) it was compared with another dataset developed by different research group with the authors; and (2) an analysis conducted in earlier peer-reviewed literature was reproduced using the aligned dataset to confirm whether the reproduced results resemble the earlier ones when datasets with different spatial resolutions were analyzed.
Comparison with another dataset
Another global, spatially explicit, historical yield dataset described in Ray et al.14 is available at the EarthStat website (http://www.earthstat.org/). We downloaded the dataset labeled âHarvested Area and Yield for 4 Crops (1995â2005)â. In this dataset, the average yield and average harvested area of the four crops at three time points [1995 (1993â1997), 2000 (1998â2002) and 2005 (2003â2007)] are available. Because the original dataset has a grid size of 5âminutes by 5âminutes in longitude and latitude, we aggregated the EarthStat yield data into a grid size of 0.5° by 0.5° in longitude and latitude for a consistent comparison. The average harvested area map at the corresponding time point from the EarthStat was used as the weight when the EarthStat average yield data at a given time point were spatially aggregated.
For the GDHY aligned version dataset, the average yields for the three time points were computed using the harvested area in 2000 as the weight throughout the study period because no time-varying harvested area map is available for any version of the GDHY. The changes in average yield between 1995 and 2005, relative to 2000, were computed using the two different datasets and are shown in Fig. 2. Annual time series data of the EarthStat dataset are not publicly accessible. Therefore, the two datasets were compared in terms of changes in average yield between the two time points. The calculated yield changes were color-coded according to the 10 categories (the 8 yield change categories from âBelow â10%â to âAbove 50%â, âNo yield data are availableâ and âNon-croplandâ in Fig. 2). Then, the inter-dataset agreement was measured by the kappa coefficient15 using the categorical yield change data. The kappa coefficient values ranged from 0.748 to 0.766, indicating good agreement between the EarthStat dataset and the GDHY aligned version dataset. The fact that the EarthStat dataset is solely based on national or subnational agricultural census statistics14 underpins the reliability of the GDHY aligned version dataset.
Reproduction of earlier analysis results
We repeated the analysis described in Iizumi et al.16 that estimated the impacts of the El Niño Southern Oscillation (ENSO) on global yields. The GDHY version 1.0 dataset1 (1.125° and the time coverage of the 25-year period from 1982 to 2006; see Table A in Supporting Information of Iizumi et al.2 for more details) is used in Iizumi et al.16. We used the GDHY aligned version dataset for the period of 1981â2016 for reproduction. The reason for the different time periods is that the validation of the aligned version dataset is the main purpose of this reproduction, and the yield data for the period of 1982â2006 in the aligned version are solely based on version 1.2 (see Alignment section in this article). For these reasons, we used the time period of 1981â2016 for the aligned version dataset, with the assumption that the average ENSO impacts on yield is less sensitive to the choice of time period studied. Because of the longer study period than that used in the original work, we replaced the Extended Reconstructed Sea Surface Temperature version 4 (ERSSTv4) dataset17 with the ERSSTv5 dataset18. Therefore, the method used to address the yield impacts of ENSO is an expanded version of description in our related work16.
The kappa coefficient values calculated against the original results (interpolated into 0.5° resolution for a consistent comparison) ranged from 0.487 to 0.553 for the impacts of El Niño, which is a warmer phase of ENSO (Fig. 3). This result indicated an intermediate level of agreement in the 6 color-coded categorical yield impact data between the original and reproduced results. The comparison for the impacts of La Niña, a cooler phase of ENSO, showed a similar level of agreement, as indicated by the kappa coefficient values of 0.486â0.550 (Fig. 4). These agreement levels are reasonable if one considers the difference in spatial resolution and the subsequent difference in spatial coverage across the two versions. The original results have a larger spatial coverage than that of the reproduced results because the larger grid cells (1.125°) used in the version 1.0 dataset often have effective yield values even when yield data for most smaller grid cells (0.5°) located within a larger grid cell are missing.
Usage Notes
Any versions of the GDHY, including the aligned version v1.2â+âv1.3, are a valuable source of information on global yields. However, caution is necessary when the goal is to make user findings derived by analyzing the GDHY robust against the inherent uncertainties in the dataset. The following is a non-exclusive summary of technical notes users should be aware of.
The yields available in the GDHY are model estimates and not free from error due to imperfect modeling, inaccurate inputs, misreporting in agricultural census statistics, and use of time-constant information. Examining the same working hypothesis using other yield datasets (preferably, observed yields) in addition to the GDHY is a good practice to increase the confidence in the findings (e.g., refs. 2,19,20).
Different conclusions could be made if different yield datasets were analyzed2,21. Practices to avoid leading to conclusions sensitive to the choice of yield dataset are important. Such practice includes utilizing statistics of yield data (e.g., multi-year average yield, relative yield change) or categorical yield data for analysis, as was presented in the Technical Validation section of this article, instead of analyzing raw yield values. Similarly, different spatial resolutions of yield datasets could lead to different conclusions19,22,23, and therefore, an examination of a userâs conclusions against uncertainty of this kind is encouraged.
Yields in some locations are lacking in the GDHY. A country or global production total aggregated from grid-cell yields were underestimated if the yield dataset, of which the spatial coverage was incomplete, was analyzed. Calculating a country average yield and then multiplying it by a countryâs harvested area are appropriate methods to obtain reasonable estimates of total production for a given spatial unit using the GDHY. Note that the beginning and ending years of the GDHY (i.e., 1981 and 2016, respectively, for the aligned version) have many missing values in the Southern Hemisphere because crop durations in the region often span two calendar years and yields cannot be estimated due to incomplete crop durations.
Code availability
The GDHY aligned version v1.2â+âv1.3 dataset is produced by combining versions 1.2 and 1.3 using a purpose-build program written in Fortran90 with the standard mathematical library. The program code was compiled on the MacOS platform but is potentially applicable to other platforms (e.g., Windows and UNIX). The code is available from the corresponding author upon request.
References
Iizumi, T. et al. Historical changes in global yields: major cereal and legume crops from 1982 to 2006. Glob. Ecol. Biogeogr. 23, 346â357 (2014).
Iizumi, T. et al. Uncertainties of potentials and recent changes in global yields of major crops resulting from census- and satellite-based yield datasets at multiple resolutions. PLoS One 13(9), e0203809 (2018).
Iizumi, T. GDHY [Data set]. Data Integration and Analysis System (DIAS), https://doi.org/10.20783/DIAS.528 (2017).
Iizumi, T. Global dataset of historical yields v1.2 and v1.3 aligned version. PANGAEA, https://doi.org/10.1594/PANGAEA.909132 (2020).
FAO, FAOSTAT (FAO, 2019).
Monfreda, C., Ramankutty, N. & Foley, J. A. Farming the planet: 2. Geographic distribution of crop areas, yields, physiological types, and net primary production in the year 2000. Glob. Biogeochem. Cycles 22, GB1022, https://doi.org/10.1029/2007GB002947 (2008).
Sacks, W. J., Deryng, D., Foley, J. A. & Ramankutty, N. Crop planting dates: an analysis of global patterns. Glob. Ecol. Biogeogr. 19, 607â620 (2010).
USDA. Major world crop areas and climatic profiles, http://www.usda.gov/oce/weather/pubs/Other/MWCACP/MajorWorldCropAreas.pdf (USDA, 1994).
Zhu, Z. et al. Global data sets of vegetation leaf area index (LAI)3g and fraction of photosynthetically active radiation (FPAR)3g derived from global inventory modeling and mapping studies (GIMMS) normalized difference vegetation index (NDVI3g) for the period 1981 to 2011. Rem. Sens. 5, 927â948 (2013).
Myneni, R., Yuri, K. & Park, T. Boston University and MODAPS SIPS - NASA. MOD15A2 MODIS/Terra Leaf Area Index/FPAR 8-Day L4 Global 1 km SIN Grid, https://doi.org/10.5067/MODIS/MOD15A2H.006 (2015).
Onogi, K. et al. The JRA-25 reanalysis. J. Meteorol. Soc. Japan 85, 369â432 (2007).
Kobayashi, S. et al. The JRA-55 Reanalysis: General specifications and basic characteristics. J. Meteor. Soc. Japan 93, 5â48 (2015).
Harada, Y. et al. The JRA-55 Reanalysis: Representation of atmospheric circulation and climate variability. J. Meteor. Soc. Japan 94, 269â302 (2016).
Ray, D. K., Ramankutty, N., Mueller, N. D., West, P. C. & Foley, J. A. Recent patterns of crop yield growth and stagnation. Nat. Commun. 3, 1293 (2012).
Cyr, L. & Francis, K. Measures of clinical agreement for nominal and categorical data: The kappa coefficient. Comput. Biol. Med. 22, 239â246 (1992).
Iizumi, T. et al. Impacts of El Niño Southern Oscillation on the global yields of major crops. Nat. Commun. 5, 3712 (2014).
Huang, B. et al. Extended Reconstructed Sea Surface Temperature version 4 (ERSST.v4): Part I. Upgrades and intercomparisons. J. Clim. 28, 911â930 (2014).
Huang, B. et al. NOAA Extended Reconstructed Sea Surface Temperature (ERSST), Version 5, https://doi.org/10.7289/V5T72FNM (NOAA National Centers for Environmental Information, 2017).
Iizumi, T. & Ramankutty, N. Changes in yield variability of major crops for 1981â2010 explained by climate change. Environ. Res. Lett. 3, 034003 (2016).
Schauberger, B., Gornott, C. & Wechsung, F. Global evaluation of a semiempirical model for yield anomalies and application to within-season yield forecasting. Glob. Change Biol. 23, 4750â4764 (2017).
Müller, C. et al. Global gridded crop model evaluation: benchmarking, skills, deficiencies and implications. Geosci. Model Dev. 10, 1403â1422 (2017).
Challinor, A. J., Parkes, B. & Ramirez-Villegas, J. Crop yield response to climate change varies with cropping intensity. Glob. Change Biol. 21, 1679â1688 (2015).
Porwollik, V. et al. Spatial and temporal uncertainty of crop yield aggregations. Euro. J. Agron. 88, 10â21 (2017).
Acknowledgements
T.I. was partly supported by Grant-in-Aid for Scientific Research (No. 16KT0036, 17K07984 and 18H02317) of the Japan Society for the Promotion of Science, The Environment Research and Technology Development Fund (S-14) of the Environmental Restoration and Conservation Agency, Japan and the Joint Research Program of Arid Land Research Center, Tottori University (30F2001).
Author information
Authors and Affiliations
Contributions
T.I. designed the study, developed the GDHY version 1.3 dataset, conducted the alignment and technical validation, and drafted the manuscript. T.S. collected, processed and quality-checked the inputs used to provide the GDHY version 1.3 dataset and helped write the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the articleâs Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
About this article
Cite this article
Iizumi, T., Sakai, T. The global dataset of historical yields for major crops 1981â2016. Sci Data 7, 97 (2020). https://doi.org/10.1038/s41597-020-0433-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-020-0433-7
This article is cited by
-
Intra-growing season dryâwet spell pattern is a pivotal driver of maize yield variability in sub-Saharan Africa
Nature Food (2024)
-
A global dataset on phosphorus in agricultural soils
Scientific Data (2024)
-
Risk-adjusted decision making can help protect food supply and farmer livelihoods in West Africa
Communications Earth & Environment (2024)
-
Global impacts of heat and water stress on food production and severe food insecurity
Scientific Reports (2024)
-
Assessing and addressing the global state of food production data scarcity
Nature Reviews Earth & Environment (2024)