Global Daily 1 KM Land Surface Precipitation Based On Cloud Cover-Informed Downscaling

www.nature.
com/scientificdata
OPEN Global daily 1 km land surface

Data Descriptor precipitation based on cloud
cover-informed downscaling
Dirk Nikolaus Karger 1,2,3 ✉, Adam M. Wilson 4, Colin Mahony 3
,
Niklaus E. Zimmermann 1 & Walter Jetz 2,3 ✉
High-resolution climatic data are essential to many questions and applications in environmental
research and ecology. Here we develop and implement a new semi-mechanistic downscaling approach
for daily precipitation estimate that incorporates high resolution (30 arcsec, ≈1 km) satellite-derived
cloud frequency. The downscaling algorithm incorporates orographic predictors such as wind fields,
valley exposition, and boundary layer height, with a subsequent bias correction. We apply the method
to the ERA5 precipitation archive and MODIS monthly cloud cover frequency to develop a daily gridded
precipitation time series in 1 km resolution for the years 2003 onward. Comparison of the predictions
with existing gridded products and station data from the Global Historical Climate Network indicates an
improvement in the spatio-temporal performance of the downscaled data in predicting precipitation.
Regional scrutiny of the cloud cover correction from the continental United States further indicates that
CHELSA-EarthEnv performs well in comparison to other precipitation products. The CHELSA-EarthEnv
daily precipitation product improves the temporal accuracy compared with a large improvement in the
spatial accuracy especially in complex terrain.
Background & Summary

High resolution information on precipitation is essential in many scientific fields, ranging from ecology, agri-
culture, forestry, to global change impact studies1–3. Spatiotemporal precipitation data is usually derived from a
range of different sources, including satellites, reanalysis, global circulation models, or precipitation gauges4,5.
However, each of these sources on their own have limitations in coverage, accuracy, or detail, impeding many
downstream uses, especially those addressing large spatial and temporal extents6,7.
Reanalysis data products such as ERA58,9, MERRA 210,11 or MSWEP12 overcome these constraints by com-
bining data from a variety of sources. To date, however, they remain limited to rather coarse spatial resolutions
such as 0.5° ~0.25°, i.e. ca. 55–27 km near the equator. This is much coarser than the scale of many environmen-
tal and ecological processes and the associated data requirements for ecosystem management and conservation.
This resolution is furthermore too coarse to capture orographic precipitation in complex terrain13–15. Global
circulation and weather models such as WRF-ARF16 and ICON17,18 are able to run at high spatial resolutions
of 1 km, but are still heavily constrained by computational limits7. Currently, global kilometer scale models are
only able to archive a simulation throughput of 0.043 SYPD (Simulated years per day)19, which amounts to an
100x shortfall compared to computationally efficient simulations defined as 1 SYPD7,20. Even with the largest
supercomputers and with state of the art climate models, as well as large financial investments, this shortfall can
only be reduced to approximately 20x21.
Although achieving 1 km resolution in numerical climate models is important to quantify effects such as
deep convection or surface drag21, studies focusing on the impact of climate on different systems often rely on a
limited set of climatic variables. In ecological studies for example, precipitation together with minimum-, mean-,
and maximum temperatures, are often used to delineate occurrences of species22. It is common to characterize
the range of a species by its climatic envelope in e.g. species distribution models (SDMs) using a rather simple
1
Swiss Federal Research Institute for Forest, Snow, and Landscape Research (WSL), Zürcherstrasse 111, 8903,
Birmensdorf, Switzerland. 2Department of Ecology and Evolutionary Biology, Yale University, 165 Prospect Street,
New Haven, CT, 06520-8106, USA. 3Center for Biodiversity and Global Change, Yale University, 165 Prospect Street,
New Haven, CT, 06520-8106, USA. 4Department of Geography, University at Buffalo, 120 Wilkeson Quad, Buffalo,
NY, 14261, USA. ✉e-mail: dirk.karger@wsl.ch; walter.jetz@yale.edu
Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 1

www.nature.com/scientificdata/ www.nature.com/scientificdata
set of climatic predictors and derivations thereof23,24. This means that for applications like these, only a subset of
available variables needs to be downscaled to finer spatial resolution.
Environmental scientists or climate impact modelers in need of high-resolution precipitation data, therefore
often resort to data from computationally less intensive methods. One such method is the spatial interpolation
of data from climatic stations. Here, precipitation gauges form the input for interpolation25 or regression models
to achieve a high spatial resolution either with or without additional, often terrain derived, predictors26–28. Such
interpolations however, usually suffer from a spatially uneven station density29–32 and severely underestimate
snowfall6,33–35. While gauge undercatch can be corrected using statistical methods in combination with steam
flow observations36, the spatially uneven distribution of gauges can lead to false parametrizations of precipita-
tion lapse rates in regression based interpolation methods37. One method to overcome the limitation imposed
by uneven station density is to directly downscale the output of reanalysis data calculated at coarser cell size37–39.
However, there are still interpolations and parametrizations involved which account for processes not resolved
at the original model resolution.
This uneven distribution of gauges can be overcome by the use of satellite data2,40–43, which offers spatially
more complete information of precipitation patterns. Yet, satellites also detect snowfall poorly44,45, meaning
that the satellite-derived amounts of precipitation have to be corrected. This is usually done by bias-correction
using station observations46–50. Although satellite precipitation products generally have a higher horizontal res-
olution than reanalysis data, they are globally still not available at resolutions of 1 km needed for local impact
studies. However, available at this very high resolution is cloud cover from satellites, which can potentially lead
to improved spatial representation of precipitation. The established relationship between cloud occurrences and
precipitation51,52 scales precipitation with cloud cover frequency such that if no clouds occur there is no precip-
itation, and increasing cloud frequency translates to increasing precipitation53.
Here we merge data from a downscaled reanalysis (ERA5) using the CHELSA algorithm37,39 with cloud cover
information derived from MODIS from the EarthEnv layer suite (https://www.earthenv.org)54 to achieve a better
representation of the fine-scale variation of global precipitation patterns. The presented CHELSA-EarthEnv daily pre-
cipitation data at ~1 km horizontal resolution offers a more reliable characterization of precipitation in topographically
heterogeneous regions and supports a range of applications that require high resolution precipitation data.
Methods
Bias correction of ERA5 precipitation data. ERA5 shows an increased performance over its predeces-
sor ERA-Interim in several attributes55 and especially in precipitation56. Nevertheless, for application in impact
studies, there is often still a significant bias observed in several parameters that need to be accounted for57. For
accumulated parameters such as total daily precipitation, we used the monthly sum of the hourly precipitation
from ERA5 pera to assess this bias. ERA5 generated estimates of the surface precipitation, similar to its predecessor
ERA-Interim are extracted from short range forecasts, which vary considerably with forecast length58. This bias
in the short-range forecasts can be a problem for monthly and climatological means as it accumulates over time58.
Several methods exist to account for such biases but most of them require gapless gridded observational data
which comes with an inherent interpolation error itself57. To correct the bias in the ERA5 precipitation estimates
and account for the interpolation errors we therefore performed a bias correction which consists of three steps.
1. One very common approach to account for reanalysis bias is to calculate the difference between baseline
precipitation from the reanalysis and the observed precipitation from station data and apply this ‘change
factor’ to the reanalysis data. We apply a monthly bias correction on the accumulated ERA5 precipitation
for each month psim. We used the monthly accumulated precipitation pobs of the gridded GPCC 2018 data-
set59. The bias correction in earlier versions of CHELSA did not adequately interpolate across the dateline,
which caused artefacts in this region. To correct for this we reprojected both pera and pobs to a North Pole
Azimuthal Equidistant projection (EPSG: 102016) to allow interpolation across the dateline. We then cal-
culate the monthly bias Rm caused by the ERA5 parametrization, and the excessive or insufficient precipita-
tion of the forecast algorithm for each month using:
pobs + c
R mobs =
psim + c
with c being a constant of 0.0001 kg*m−2*s−1 to avoid division by zero. We only used grid cells with
meteorological stations present for the calculation of the observed bias R mobs . The forecast algorithm used to
produce the precipitation amounts for ERA5 exhibits a considerable bias (too much or too less precipita-
tion), that has a coherent spatial structure, with a larger bias over high elevation terrain, or specific
landforms such as tropical rainforests. Based on this observation, we assumed that grid cells without
stations share a similar bias as their neighbouring stations.
2. To achieve a gap-free bias correction grid surface, we interpolated the gaps in the Rm grid using a multilevel
B-spline interpolation60 with 14 error levels optimized using B-spline refinement to a 0.25° resolution. The
multilevel B-spline approximation60 applies a B-spline approximation to Rm starting with the coarsest
lattice φ0 from a set of control lattices φ0, φ1, … , φn with n = 14 that have been generated using optimized
B-spline refinement61. The resulting B-spline function f0 (R mobs ) gives the first approximation of Rm. f0 (R m)
leaves a deviation between Δ1R mobsc = R mobs − f0 (xc , yc ) at each location (xc , yc , R mobs c ) 61. Then the next
control lattice φ1 is used to approximate f1 (Δ1R mobsc )61. Approximation is then repeated on the sum of
f0 + f1 = R mobs − f0 (xc , yc ) − f1 (xc , yc ) at each point (xc , yc , R mobs c ) n times resulting in the gap free
interpolated bias surface R mint 61.

3. The bias correction surface R mint is then multiplied with the ERA5 precipitation psim to get the bias corrected
monthly precipitation sums pmcor at 0.25° resolution:
pmcor = psim ∗ R mint
Orographic wind effects. Orographic effects are among the most reported drivers of precipitation62–66.
Orographic effects have been taken into account using a variant of the CHELSA V1.2 algorithm which uses a
parametrization of orographic rainfall based on wind fields67–70. We used daily u-wind and v-wind components at
the 10-m level of ERA5 as underlying wind components. As the calculation of a windward leeward index H (here-
after: wind effect) requires a projected coordinate system, both wind components (u-wind, v-wind) were pro-
jected to a world Mercator projection and then interpolated to a 3 km grid resolution using a multilevel B-spline
interpolation similar to the one used for the bias correction surface. The resolution of 3 km was chosen as resolu-
tions of around 1 km would over-represent orographic terrain effects26. The wind effect H was then calculated by
multiplying the windward Hw and leeward HL components calculated using:
HW =
n
∑ i=1 d
WHi
1
tan−1 ( ) +∑
dWZi
0.5
dWHi
n
i=1 d
1
LHi
tan−1 ( )
dLZi
0.5
d LHi
n 1 n 1
∑ i=1 d ∑ i=1 d
LHi LHi
HL
∑ ni =1 ln(d
1
WHi)
tan−1 ( )
dLZi
0.5
dWHi
n 1
∑ i =1 ln(d
LHi)
where dWHi and dLHi refer to the horizontal distances between the focal 3 km grid cell in windward and leeward
direction and dWZi and dLZi are the corresponding vertical distances compared with the focal 3 km cell following
the wind trajectory. Distances are summed over a search distance of 75 kilometers as orographic airflows are
limited to horizontal extents between 50–100 km71,72. The second summand in the equation for HW , L where
dLHi < 0 accounts for the leeward impact of previously traversed mountain chains. The horizontal distances in
the equation for HW , L where dLHi ≥ 0 lead to a longer-distance impact of leeward rain shadow. The final
wind-effect parameter, which is assumed to be related to the interaction of the large-scale wind field and the
local-scale precipitation characteristics, is calculated as:
H = HW , L → dLHi < 0 ∗ HW , L → dLHi ≥ 0

and generally, takes values between 0.7 for leeward and 1.3 for windward positions. Both equations were applied
to each grid cell at the 3 km resolution in a World Mercator projection.
We used the boundary layer height PBL from ERA5 as an indicator of the pressure level that has the highest
contribution to the wind effect. PBL and H have been interpolated to a 30 arc second using a B-spline interpo-
lation. To create a boundary layer height corrected wind effect HB, the wind effect grid H containing was then
proportionally distributed to all grid cells falling within a respective 0.25° grid cell using:
H
HB =
1− ( z − PBL z − z max
h )
with zmax being the maximum distance between the boundary layer height Bz at elevation z and all grid cells at
a 30 arc sec resolution falling within a respective 0.25° grid cell, h being a constant of 9000 m, and z being the
respective elevation from the Global Multi-resolution Terrain Elevation Data (GMTED2010)72 with:
PBLz = PBL + z ERA + f

where B is the height of the daily means of the boundary layer from ERA5, and zERA is the elevation of the ERA5
grid cell. The boundary layer height provided by ECMWF is based on the Richardson number73 which is usually
at the lower end of the elevational spectrum compared to other methods74. We therefore tuned our model by
adding a constant of 500 m similar to the approach in the original CHELSA algorithm37.
Although the wind effect algorithm can distinguish between the windward and leeward sites of an orographic
barrier, it cannot distinguish extremely isolated valleys in high mountain areas75. Such dry valleys are situated
in areas where the wet air masses flow over an orographic barrier and are prevented from flowing into deep
valleys75. These effects are however mainly confined to large mountain ranges, and are not as prominent in inter-
mediate mountain ranges72. To account for these effects, we used a variant of the windward-leeward equations
with a linear search distance of 300 km in steps of 5° from 0° to 355° circular for each grid cell. The calculated
leeward index was then scaled towards higher elevations using:

z
 n
 ∑ i =1 ln(d1 ) tan−1
E = 
WHi ( )dLZi
0.5
dWHi
 h



1
 ∑ ni =1 ln(d ) 
 LHi 
which rescales the strength of the exposition index relative to elevation z from GMTED2010, and gives valleys
at high elevations larger wind isolations E than valleys located at low elevations. The correction constant h was
set to 9000 m to include all possible elevations of the DEM and because values of z > h could otherwise lead to a
reverse relationship between z and E.
pI = E ∗ HB
c
will give the first approximation of precipitation intensity pI at each grid location (xc , yc ).
c
Precipitation including orographic effects. To achieve the distribution of daily precipitation po given the
approximated precipitation intensity pI c at each grid location (xc , yc ), we used a linear relationship between pmcor
and pI c using:
p
Ic
po = 1
∗ pmcor
n
n
∑ i =1 pIci
where n equals the number of 0.0083334°. grid cells that fall within a 0.25 grid cell. This equation ensures that
the precipitation at 0.25° resolution exactly matches the mean precipitation of all 0.0083334° cells that overlap
with a 0.25° cell.
The GPCC dataset used for the bias correction does not include a correction for gauge undercatch. We
therefore additionally correct for gauge undercatch using a downscaled version of the bias correction layers
from Beck et al. 202036. We downscaled the bias correction surfaces to 0.0083334° by using a moving window
regression with a search radius of three cells and elevation from GMTED2010 as predictor. We then multiplied
this downscaled bias correction layer with the po.
Monthly cloud frequencies. To derive monthly cloud frequencies, we used the internal cloud mask in the
PGE11 program that relies on two reflective and one thermal test MODIS MOD09 atmospherically corrected
surface reflectance product76,77. The reflective tests include the shortwave and middle infrared data combined in
the “middle infrared anomaly” index (MIRA = ρ20,21 − 0.82ρ7 + 0.32ρ6, where ρ indicates MODIS band num-
ber). The second test uses reflectance at 1.38 microns (1.38 mic = ρ26). The MIRA and the 1.38 mic reflectance
are designed to be complementary, with MIRA efficiently detecting low or high reflective clouds77, while 1.38 mic
effectively detects high (and potentially not very reflective) clouds. Additionally, a thermal test is used to identify
pixels with high infrared reflectance anomalies (e.g., fires, sun-glint, and high albedo surfaces) with respect to
near-surface (2 m) air temperature computed by the NCEP reanalysis model78. The MOD09 cloud algorithm was
designed to minimize confusion over snow and ice by taking the surface air temperature into account. Like many
cloud masks, the MOD09 detection algorithm has a binary response (cloudy/not cloudy) and does not retain
an estimate of confidence in cloud state (i.e., probability that the pixel was actually cloudy given the tests). We
extracted the daily cloud flags from bit 10 of the daily daytime surface reflectance product “state 1 km” Scientific
Data Set (SDS) from both the Terra (MOD09GA, collected at approximately 10:30 AM local time) and Aqua
(MYD09GA, approximately 1:30 PM) satellites. The time series of monthly cloud frequencies (proportion of
days with a positive cloud flag) was calculated separately for the daily MOD09GA and MYD09GA data using the
Google Earth Engine application programming interface (http://earthengine.google.org/).
Cloud frequency correction of daily precipitation estimates. We include monthly cloud frequencies
cfm into the daily precipitation estimates assuming that the frequency of cloud occurrences is related to precipita-
tion events and their geographic distribution carries a spatial signal of precipitation51,52. Strictly we assume that
where no clouds occur, no precipitation occurs, and where clouds occur more frequently, more precipitation
occurs53. To achieve the distribution of daily precipitation p given the approximated orographic corrected precip-
itation po at each grid location (xc , yc ), we first approximate the cloud cover corrected precipitation intensity using:
pcf c
= po ∗ cfm
This however distorts the precipitation amount of each grid cell. We therefore repeat the step used to create
orographic precipitation in a similar manner by estimating daily precipitation p at each grid location (xc , yc )
using:
pcf
p= 1
c
∗ pmcor
n
∑ ni =1 pcf
ci
where n equals the number of 30 arc sec. grid cells that fall within a 0.25 grid cell.

item value
Resolution 0.0083333333
West extent (minimum X-coordinate, longitude): −180.0001388888
South extent (minimum Y-coordinate, latitude) −90.0001388888
East extent (maximum X-coordinate, longitude) 179.9998611111
North extent (maximum Y-coordinate, latitude) 83.9998611111
Rows 20,800
Columns 43,200
Table 1. Grid extent and resolution of the GeoTIFF files.
Native spatial temporal

model version resolution resolution precipitation source data method citation
constrained regression &
PRISM AN81d 0.041667° daily point rain gauge station data 26
interpolation
point rain gauge station data, multi-source weighted
MSWEP 2.1 0.1° 3 hourly 5
reanalysis data, satellite observations ensemble
satellite observations, point rain modified inverse distance
CHIRPS 2.0 0.05° daily 43
gauge station data weighting
reanalysis data, gridded rain gauge
CHELSA 2.1 0.0083334° daily model output statistics —
station data
WorldClim 2.1 0.041667° monthly point rain gauge station data regression & interpolation 84
Table 2. Overview of the precipitation datasets used for comparison and their respective properties and
methodologies.
Data Records
The dataset79 is available at EarthEnv (https://doi.org/10079/MOL/6f52b80d-0a41-40f7-84ec-873458ca6ee6).
All files are provided as georeferenced tiff files (GeoTIFF). GeoTIFF is a public domain metadata standard which
allows georeferencing information to be embedded within a TIFF file. Additional information included in the
file are: map projection, coordinate systems, ellipsoids, datums, and fill values.
GeoTIFF can be viewed using standard GIS software such as:

SAGA GIS – (free) http://www.saga-gis.org/
ArcGIS - https://www.arcgis.com/
QGIS - (free) www.qgis.org
DIVA – GIS - (free) http://www.diva-gis.org/
GRASS – GIS - (free) https://grass.osgeo.org/
All files contain variables that define the dimensions of longitude and latitude (Table 1). The time variable is
usually encoded in the filename.
All files are in a geographic coordinate system referenced to the WGS 84 horizontal datum, with the horizon-
tal coordinates expressed in decimal degrees. The extent (minimum and maximum latitude and longitude) are
a result of the coordinate system inherited from the 1-arc-second GMTED2010 data which itself inherited the
grid extent from the 1-arc-second SRTM data.
The filename includes the respective model used, the variable short name, the respective time variables, and
the version of the data:
[Model]_[short_name]_[day]_[month]_[year]_[Version].tif
There are two different models available. CHELSA which includes the results from the bias correction and
orographic correction, and CHELSA_EarthEnv which includes the cloud cover correction as well.
The unit of the precipitation is CHELSA_EarthEnv is: (kg*m−2*day−1)/100.
Technical Validation
To validate the performance of CHELSA_EarthEnv we are focusing on (a) the downscaling performance by
calculating different performance metrics between coarse and high resolution and comparing observations
from meteorological stations and (b) a comparison with similar high-resolution precipitation datasets (Table 2)
within the continental United States where meteorological station density is high and of good quality.
Validating the downscaling performance. To validate if the downscaling to 0.0083334° resolution leads
to a better performance over the coarser 0.25° gridded data that was used as forcing, we compare both resolutions
with precipitation measured at Global Historical Climate Network – daily weather stations (GHCN-D)80. The
0.25° resolution has been chosen as benchmark as it is the resolution of the forcing ERA5 data that is used as an
input for CHELSA_EarthEnv. To set the performance changes in to four comparable products (Table 2): PRISM

AN81d, MSWEP 2.1, CHIRPS 2.0, and WorldClim 2.1 and repeated the analysis with these datasets over the
continental United States except Alaska.
Accessing the global downscaling performance across several metrics. To validate the per-
formance of CHELSA_EarthEnv globally we compare it to observations at metrological stations from the
GHCN-D80 network for the time 2003–2016. We use only stations without any quality flags and compare them to
the precipitation data at the coarse 0.25°, and the high 0.0083334° spatial resolution.
Downscaling can affect different aspects of model performance such as bias, variability, or correlation coeffi-
cients. To test in a first step which metric is affected by the applied downscaling we calculated for each grid cell
separately the Kling-Gupta efficiency (KGE) scores from daily time series from 2003 to 2016. KGE is a perfor-
mance metric combining correlation, bias, and variability81,82 and is defined as follows:
KGE = 1 − (r − 1) 2 + (β − 1) 2 + (γ − 1) 2
where the correlation component r is represented by the Pearson’s correlation coefficient, the bias component 𝛽
by the ratio of estimated and observed means, and the variability component γ by the ratio of the estimated and
observed coefficients of variation:
σs
μs μs
β= and γ = σo
μs
μo
where μ is the mean and σ the standard deviation, and the subscripts s and o indicate simulated and observed,
respectively. KGE, r, β, and γ values all have their optimum at 1. KGE values between −0.41 and 1 indicate that
the model estimates precipitation better than just taking the mean of the recorded precipitation at the gauges83.
We also calculated the percent bias (pbias) that reflects the average tendency of the modelled precipitation
values psim to be larger or smaller than their observed values pobs at the stations. The optimal value of pbias is 0,
with low values indicating accurate model simulation. Positive values indicate an overestimation, whereas neg-
ative values indicate an underestimation. pbias is defined as follows:
 ∑ ni =0 (p − p ) 
 sim i obsi 
pbias = 100 ∗  n

 ∑ i =0pobs 
 i 
Additionally, we also report the mean absolute error (mae) which is defined as:
1 n 
mae = ∑ ∣p − p ∣ 
i =0 simi obsi 

n
and the root mean squared (rmse) error which is defined as:
1 n 
rmse = ∑ (p − p ) 2

n i =0 sim i obs i

Accessing the regional performance. To compare the results to similar precipitation datasets, we
use GHCN-D and four other gridded datasets (Table 2) that provide data over the same time period: PRISM
(AN81d)26, MSWEP 2.112, and CHIRPS 2.043, and WorldClim 2.184. PRISM is a high-resolution precipitation
dataset for the United States that, similar to CHELSA_EarthEnv, takes orographic effects into account and addi-
tionally profits from a dense quality-controlled network of weather stations. While PRISM uses a regression
approach to predict long term precipitation climatologies, daily precipitation is derived from climatologically
aided interpolation (CAI)85. MSWEP 2.1 is a merged product from various sources (weather stations, reanalysis
data, satellite observations) and consistently has high performance scores in comparison to other precipitation
products6. CHIRPS is a high-resolution precipitation dataset, that integrates remote sensed precipitation with
observations from weather stations. Additionally, we also include the WorldClim 2.1 data in our comparison.
Although WorldClim 2.1 does not offer daily data, it provides monthly timeseries that has been created using
climatologically aided interpolation of the CRU-TS 4.03 data86. All these datasets have been aggregated over the
period 2003–2016 to annual means to gain a comparable temporal extent as CHELSA_EarthEnv. We then com-
pare these data from the different datasets with observations from GHCN-D80 for the continental United states
except Alaska. Within this spatial extent all five products overlap and the quality of the stations can be considered
as high. All products have additionally been aggregated to a 0.25° grid resolution by taking the mean of all grid
cells overlapping with a 0.25° grid cell in WGS84 geographic projection. We then used all stations with data avail-
able between 2003 and 2016 and without any quality flag (58,071 stations) and extracted precipitation from both
the highest available spatial resolution of the different datasets (Table 2) and the coarse 0.25° resolution using a
nearest neighbour approach. We then calculated the differences in absolute bias between coarse and high resolu-
tion, and compared these among products using an ANOVA with post-hoc Tukey HSD test.

Fig. 1 Spatial comparison of different metrics among CHELSA_EarthEnv at high resolution (0.0083334°) and coarse
resolution (0.25°) based on observations from GHCN-Daily data. From top left to bottom right: Spatial distribution
of the GHCN-Daily stations without any quality issues that were used for the validation. ΔKling-Gupta-Efficiency
(KGE) values between high and coarse resolution with positive values indicating an improvement in KGE in the
downscaled data over the coarse grid data. Δrmse and Δmae values, with negative values indicating an improvement.
Differences in the correlation coefficient (r), with positive values indicating an improvement. Δβ values indicate the
bias component of the KGE value, and Δγ values the variance component of KGE. In both cases a negative value
indicates an improvement of the downscaled values (values closer to unity). The Δpbias gives the absolute changes in
percent precipitation bias values, with negative values indicating an improvement.

Resolution rmse mae pbias KGE Person’s r β γ

0.0083334° 6.592 2.584 −2.500 0.493 0.562 0.975 0.828
0.25° 6.608 2.628 −2.700 0.448 0.548 0.973 0.790
Difference (Δ) −0.016 −0.044 −0.200 0.045 0.014 0.002 0.038
Table 3. Global test metrics for a comparison between the downscaled CHELSA_EarthEnv data and the
original ERA5 data based on 122,236,056 observations at 58,071 stations between 2003 and 2016. rmse =
root mean squared error, mae = mean absolute error, pbias = percent precipitation bias88, KGE = Kling-
Gupta Efficiency, r = Pearson product-moment correlation coefficient, β = the ratio between the mean of
the simulated values and the mean of the observed ones, γ = ratio between the coefficient of variation (CV) of
the simulated values to the coefficient of variation of the observed ones. Units for the rmse and mae are in
kg m−2 day−1.
Fig. 2 Taylor plots for comparisons between CHELSA_EarthEnv (CHELSA), PRISM, CHIRPS, MSWEP,
and WorldClim with GHCN-D observations for the continental United States from 2003–2016. The left plot
is based on the comparison of daily values. Each dot represents a month. Here PRISM performs best, with
MSWEP, CHELSA and CHIRPS following in that order. The plot on the right shows the performance of
monthly climatological means (2003–2016). Here the aggregation of precipitation values leads to an increase
in performance of all models with PRISM still showing the highest performance, with CHELSA, MSWEP,
and CHIRPS performing equally well. The WorldClim monthly timeseries does not perform well with low
correlation and a high standard deviation.
Comparison with PRISM. The validation of the temporal accuracy, done using the GHCN-D station data
gives information how well a product reproduces precipitation directly at the locations of these stations. All prod-
ucts we compare to CHELSA here are however, at least partly, parameterized on a subset of the GHCN-D stations
as well. This often leads to a high fit with station data in all products that use exactly these climate stations at the
locations of the stations. However, predicted precipitation patterns between stations, where the data is actually
interpolated or predicted cannot be validated in this way. The performance of a model to predict the spatial
patterns of precipitation correctly could for example be accessed by a cross validation approach, but this is not
possible without the station data or the source code of the respective model being available. As the exact station
data each dataset uses are generally not available, we use the spatially explicit PRISM model as a benchmark for
comparison. PRISM has a very high accuracy and captures small scale precipitation gradients well. It uses the
highest amount of meteorological stations of all models compared here. It is however, also a model and therefore
has its own inherent biases. To compare models, we aggregated the daily values (monthly for WorldClim) over
2003–2016 to mean annual precipitation, and calculated the bias and correlation between products.
Comparing precipitation lapse rates. In a case study, we compare CHELSA-EarthEnv’s annual precip-
itation climatology in coastal British Columbia with that of PRISM, simulation data from the Weather Research
and Forecasting (WRF) convection-permitting dynamical simulation for North America87; and WorldClim2.1.
We calculated horizontal precipitation gradients for each grid cell by multiplying precipitation lapse rate by the
terrain slope. The precipitation lapse rate is calculated from a moving window regression of precipitation against
elevation in the 8 cells surrounding the focal cell.
Accessing the improvement from the cloud layers. We validate the inclusion of the cloud frequen-
cies from MODIS in two steps. First, we compare the global performance of the precipitation dataset with, and

Fig. 3 (a) Bias in annual mean precipitation estimates for five different precipitation datasets compared to observations
from GHCN-D. PRISM shows the lowest bias, while all other three products have dry bias in the eastern United States.
WorldClim has the highest bias of all products, with a large dry bias in the central United States. CHELSA_EarthEnv
has a wet bias mainly in mountainous regions, while MSWEP and CHIRPS have a dry bias specifically in the eastern
Rocky Mountains. (b) Changes in absolute bias between a coarse resolution (0.25°) and the native resolution of the
different datasets. Negative values indicate that the downscaling decreases the absolute bias. All datasets show a lower
absolute bias in mountainous terrain at a high resolution. In all datasets except for WorldClim, the higher resolution
shows a lower absolute bias even in the convective regimes of the central United States. WorldClim is only able to
reduce the bias in mountainous terrain, but the downscaling has no effect over most of the extent.

entire extent > 104°W <= 104°W
<0 1000 2000 3000 >4000

m.a.s.l.
b d a a c bc c a a b c d a ab b
0
0
−0.1
−0.1
−0.1
kg m−2day−1
−0.2
−0.2
−0.2
−0.3
−0.3
−0.3
PRISM
CHELSA
MSWEP
CHIRPS
WorldClim
PRISM
CHELSA
MSWEP
CHIRPS
WorldClim
PRISM
CHELSA
MSWEP
CHIRPS
WorldClim
Fig. 4 Difference in absolute bias between precipitation at 0.25° coarse resolution and precipitation at
high resolution. Data is shown for annual means and five different datasets and based on a comparison to
observations from GHNC-D stations. Plots are shown separately for stations covering the entire extent of the
continental United States excluding Alaska (left), stations only covering the topographic heterogeneous western
part (>104°E) (middle), and stations only covering the comparably homogeneous terrain < = 104°E in the
Eastern United States (right). The more negative the value, the lower the bias of the high resolution compared
to the coarse resolution. Letters are indicate a significant differences in means based on an Anova with post-hoc
TukeyHSD test.
without cloud refinement globally using GHCN-D. The refinement however, is done at the 0.0083334° resolution,
and the mesoscale patterns of the data with, or without refinement are nearly identical. To compare the to datasets
with, and without refinement at the scale where an effect of the cloud layer is actually expected, we use the island
of Hawai’i as an example. Here both the station density and the quality of the stations are high, and the island has
step precipitation gradients ranging from nearly 0 to >20 kg m−2s−1. We use 105 stations that recorded at least
25 days per month between 2003 and 2016 from GHCN-D dataset and compare the annual mean precipitation
it to the one derived at the original 0.25° resolution, the data without cloud refinement, and the data with cloud
refinement at 0.0083334° resolution.
Global downscaling performance across several metrics. Kling-Gupta Efficiency, as well as Pearson’s
r values were highest in Europe, Central Asia, and North America (Supplementary Fig. 1). The lowest values
are found within the tropics, but also in areas with very high precipitation, such as Venezuela, Colombia, or the
Congo basin, or very low precipitation, such as the Sahara, or the Arabian Peninsula. There are several possible
explanations for the relatively lower performance in the tropics. We are using the GHCN-D dataset for valida-
tion, as it is one of the few available datasets for large-scale, global validation of precipitation. Gauge data such as
GHCN-D is however very heterogeneous in quality30,83,85,86 and, even after cleaning using the provided quality
flags, errors likely remain. The lower validation performance in these regions may therefore be partially an arte-
fact of poor station data quality.
Differences in KGE values between coarse and high resolution are higher in areas with large spatial heteroge-
neity such as mountains (Fig. 1). This shows that the downscaling has a positive effect on the estimation of pre-
cipitation at high spatial resolutions (Table 3). The increase in KGE values is however, not confined to areas with
heterogeneous terrain, but also the lowlands in the United States or Europe. The high-resolution data shows
improvements in KGE and all of its components compared to the coarse 0.25° data. Performance gains are given
for the root mean squared error (rmse), mean absolute error (mae), and percent bias (pbias) (Fig. 1). The global
performance gain is ΔKGE = 0.045, but shows a strong geographical pattern (Fig. 1) especially in mountainous
regions such as the Andes, or the Rocky Mountains, but also large parts of Asia. Performance losses are most
prominent in Western Indonesia, with the rest of Indonesia however, showing a gain in KGE.
While globally an increase in the γ component of KGE is larger than the increase in the β or r component, in
most of the regions with the highest gain in KGE, both increases in r and β prevail. A possible explanation for
this is that the inclusion of topography in the downscaling has the largest effect on the bias (Fig. 1).

Fig. 5 Comparison between mean annual precipitation rates (left) estimated from four different products with
those of PRISM used as a reference dataset. Annual mean of daily precipitation (left). While all three product
that are compared to PRISM capture the mesoscale precipitation patterns quite well, differences exist mainly in
the eastern Rocky Mountains, where CHIRPS, MSWEP and WorldClim are dryer compared to PRISM (right).
CHELSA is wetter compared to PRISM mainly in mountainous terrain.
The more evenly distributed differences in the γ component, which reflects the variability in precipita-
tion is most likely due to the inclusion of the MODIS cloud cover, that adds additional information on the
spatio-temporal variance in precipitation to the downscaling. Although we only included monthly cloud fre-
quency distributions into the downscaling, this shows the potential high resolution cloud cover frequencies have
in improving high resolution precipitation estimates globally.
Regional performance. The comparison of all five precipitation products for the continental United States,
shows a relatively high performances of all datasets (Fig. 2) ranging from a correlation of r ~ 0.85 (PRISM), to

Fig. 6 Scatterplots comparing the observed long term mean over the continental United States compared to
PRISM as benchmark dataset for four precipitation datasets. Data has been aggregated to annual means.
r ~ 0.5 (CHIRPS). CHELSA_EarthEnv performs slightly worse than MSWEP in estimating daily precipitation
rates, but better than CHIRPS. PRISM performs best with the highest correlations compared to GHCN-D. The
performance increases for all products when monthly climatological means, instead of daily precipitation values
are used, with CHELSA, CHIRPS, and MSWEP performing almost identically. PRISM still outperforms all mod-
els slightly. WorldClim shows a comparably poor performance compared to all other products during the period
2003–2016 with low correlations (r ~ 0.5) and a much higher standard deviation than all other products.
All precipitation products use part of the GHCN-D stations to parametrize their algorithms. PRISM uses the
daily station data directly and uses the anomalies from long term climatologies at the stations and interpolates
them to achieve a gap free anomaly surface for the CAI. The achieved performance might therefore be due to
the high station density in PRISM itself. CHELSA_EarthEnv uses GPCC gridded station data at 0.25° for a
bias correction, therefore the algorithm cannot force the interpolation through each station location directly,
which might explain the difference between PRISM and CHELSA_EarthEnv. CHIRPS uses a smaller set of
stations compared to PRISM, so the difference in performance might partly be due to the less dense station
network. MSWEP uses a wide variety of input sources from remote sensed data, to reanalysis data, to station
data. MSWEP therefore averages out most of the errors of a single source, which leads to a relatively high per-
formance in the resulting precipitation estimates12. Interestingly, WorldClim does not perform well compared
the other products, despite being parameterized on a large number of stations. This might be due to errors in the
parametrization of the predictors used for the long term climatologies, or uncertainties from the CAI applied
on the CRU-TS data.
Downscaling performance in relation to comparable products. The bias compared to observa-

tions at stations is heterogenous in all different precipitation datasets. PRISM shows the lowest bias compared to
GHCN-D data, while CHELSA_EarthEnv, MSWEP, and CHIRPS show similar biases (Fig. 3). WorldClim has the
largest overall bias of all five comparable products.
A similar pattern emerges when the different products are compared at the 0.25° and the highest resolution.
Comparing the absolute bias of the coarse resolution aggregations with the highest available resolutions shows
that all different precipitation datasets have a lower absolute bias at the highest spatial resolution (Fig. 3). The
amount of bias correction however varies to a large degree, with PRISM and CHELSA_EarthEnv showing the
largest bias reduction, while CHIRPS and MSWEP show a slightly lower bias reduction, and WorldClim the

(a) (b)
Elevation (m)
2000
Cross−section 1500
1000
500
0
(c) CHELSA−EarthEnv (d)
Gradient (%/km)
Precip. (mm)
4000 20
2000 10
1000 0
500 −10
250 −20
CHELSA−EarthEnv
(e) PRISM (f)
Gradient (%/km)
Precip. (mm)
4000 20
2000 10
1000 0
500 −10
250 −20
PRISM
(g) WRF (h)
Gradient (%/km)
Precip. (mm)
4000 20
2000 10
1000 0
500 −10
250 −20
WRF
(i) WorldClim2 (j)
Gradient (%/km)
Precip. (mm)
4000 20
2000 10
1000 0
500 −10
250 −20
WorldClim2
(k) (l) 20
gradient (%/km)
4000
Precipitation
Precipitation
10
2000
(mm)
0
1000
CHELSA−EarthEnv CHELSA−EarthEnv −10
PRISM 500 PRISM
WRF WRF −20
WorldClim2 Terrain 250 WorldClim2 Terrain
Fig. 7 Case study intercomparison of CHELSA-EarthEnv and other gridded precipitation products for the
Coast Range of British Columbia. (a) Topography of the study area and location of the terrain cross-section
featured in panels k-l. (b) location of the study area in southwestern British Columbia and North America.
(c,e,g,i) Annual precipitation climatology for (c) CHELSA-EarthEnv, (e) PRISM, (g) Weather Research and
Forecasting (WRF) simulation for North America, and (i) WorldClim2. (d,f,h,j) Horizontal gradients of
the precipitation climatology for each product; blue (red) indicates that precipitation increases (decreases)
with elevation. (k,l) precipitation climatologies and gradients along the terrain cross-section (gray polygon)
indicated by the dashed lines in the previous panels.
lowest reduction. The relative smaller reduction of CHIRPS and MSWEP to CHELSA_EarthEnv and PRISM
might could be due to the lower native spatial resolution (Table 2) compared to CHELSA_EarthEnv and PRISM
(Table 2). However, the monthly WorldClim timeseries has the same native spatial resolution as PRISM, and still
has a very low difference in absolute bias between the high and the coarse resolution, indicating poor downscal-
ing performance.
Downscaling performance also varies geographically (Fig. 4). Generally, the bias reduction is higher in
mountainous regions of the western United States, and lower in the more homogenous terrain in the east.
Comparing at which stations the bias is reduced (Fig. 5), shows that PRISM, CHELSA_EarthEnv, MSWEP and
CHIRPS are able to reduce the absolute bias in mountainous terrain, but also in the convective regimes of the
Midwest and Southwest of the United States. WorldClim only reduces the bias in the mountainous regions, but
does not reduce the precipitation bias in convective regimes.
Comparison with PRISM. PRISM shows consistently the highest performance metrics and is therefore a
suitable benchmark for a spatially explicit comparison. Overall, all precipitation datasets show similar mesos-
cale patterns of precipitations (Fig. 5). Marked differences are mainly apparent in the southwestern United
states, where all models are comparably dryer than PRISM. Differences are also apparent in the eastern Rocky
Mountains, where CHIRPS, MSWEP, and WorldClim have a considerable dry bias, but CHELSA_EarthEnv

Fig. 8 Comparison of mean annual precipitation at coarse and high-resolution, both with and without cloud
refinement, for the island of Hawai’i between 2003–2016. Points refer to observations at 105 stations from
GHCN-D. While the mesoscale patterns of precipitation are relatively similar, the predictions using cloud
refinement provided a stronger fit with station observations. Cloud refinement benefits prediction accuracy
especially at stations with low precipitation. The Person’s correlation coefficient between observed and predicted
precipitation is greater at fine compared to the coarse resolution data, and for data with cloud refinement, with
concomitant changes in mean absolute error (mae), and the root mean squared error (rmse). Precipitation in
2003–2016 is particularly large in the southwest of the island, an area that is usually dryer in long term (30 year)
climatologies.
shows more similar precipitation rates as PRISM. Overall CHELSA_EarthEnv shows the lowest differences and
highest correlations to PRISM (Fig. 6) (r = 0.97, mae=0.20), followed by MSWEP (r = 0.97, mae = 0.23) and
CHIRPS (r = 0.96, mae = 0.23). WorldClim shows the highest differences with PRISM and the lowest correlation
among all products (r = 0.95, mae = 0.28).
Precipitation lapse rates. The general similarity between CHELSA-EarthEnv and PRISM (at 800 m resolu-
tion) in precipitation amount and in precipitation gradients (Fig. 7d,f) is notable, given that elevation-precipitation
relationships in CHELSA-EarthEnv are produced by the orographic wind effect algorithm, rather than by elevational
relationships to station observations as in PRISM. The WRF simulation is independent of station observations and
provides further evidence that precipitation increases with elevation in this region (Fig. 7h). Weaker gradients in
WRF are due to the coarser (4 km) grid scale, which imposes more subdued gradients of both terrain and precip-
itation. The strong negative gradients in WorldClim2 (Fig. 7j) are due to derivation of a precipitation-elevation
relationship from stations spanning the windward (low elevation stations with high precipitation) and leeward
(higher elevation stations with low precipitation) sides of the mountain range. These erroneous negative gradients
produce a strong underestimation of regional precipitation (Fig. 7i) as they are used to extrapolate station pre-
cipitation into higher elevations (Fig. 7k) that have very low station density. This case study illustrates the utility
of CHELSA-EarthEnv for mountainous regions with sparse station observations: the dynamical ERA5 reanalysis
provides a physically plausible regional distribution of precipitation while the orographic wind effects algorithm
provides credible local elevational gradients, even in the absence of station observations.
Improvement from the cloud layers. The global comparison between the predicted precipitation with
and without cloud cover refinement yielded in very small differences in all test metrics indicating no significant
differences in global test metrics (with cloud refinement r = 0.609, mae = 2.404, without refinement: r = 0.610,
mae = 2.402). The cloud cover refinement, however happens on a spatial scale, that is not necessarily captured
well by a global comparison. The local comparison for the island of Hawai’i (Fig. 8) indicates that the cloud cover
refinement largely acts on the local scale, where it reduces the wet bias of the interpolation without cloud cover
refinement. Without the refinement the CHELSA algorithm distributes precipitation based on wind fields and

boundary layer height alone. It does not distinguish areas that are usually above the clouds very well, leading to
an overestimation in precipitation in these areas. Here the cloud cover refinement shows an effect, by increasing
the correlation between predicted precipitation and observed precipitation, as well as decreasing the error in the
estimates (Fig. 8).
Validation results—Conclusions. The comparison of the coarse grid resolution with the high resolution
of CHELSA_EarthEnv shows that the applied downscaling is able to increase the accuracy of the precipitation
predictions in several aspects and generates realistic precipitation patterns in complex terrain. The downscaling
algorithm together with remotely sensed cloud cover performs equally well as other high-resolution products
in predicting precipitation. The CHELSA_EarthEnv algorithm produces similar high resolution precipitation
patterns as datasets that need to be informed by a high quality, dense weather station network without directly
relying on stations itself. With respect to the realistic simulation of precipitation gradients in complex terrain, it
also outperforms comparable high resolution global products.
Usage Notes
Note that because of the pixel center referencing of the input GMTED2010 data the full extent of each grid as
defined by the outside edges of the pixels differs from an integer value of latitude or longitude by 0.000138888888
degree (or 1/2 arc-second). Users of products based on the legacy GTOPO30 product should note that the coor-
dinate referencing of each grid (and GMTED2010) and GTOPO30 are not the same. In GTOPO30, the integer
lines of latitude and longitude fall directly on the edges of a 30-arc-second pixel. Thus, when overlaying grids
with products based on GTOPO30 a slight shift of 1/2 arc-second will be observed between the edges of corre-
sponding 30-arc-second pixels.
CHELSA_EarthEnv differs in several aspects with the already available climatological data (CHELSA
V1-V2)37 and long term downscaled CMIP5 modelled data (CHELSAcmip5ts)39. The main difference is the
increase in temporal resolution to a daily one, compared to the other two datasets. It is similar to CHELSA V1.x
in the respect that both are ‘observational’ datasets, while CHELSAcmip5ts is a downscaled “modelled” dataset.
A value of a climate variable given a specific day or month in CHELSA_EarthEnv, or CHELSA V1.x can there-
fore be seen as an event which actually has been recorded, while one in the CHELSAcmip5ts dataset is only a
modelled and does not represent a real observation similar to those in the forcing CMIP5 models.
Code availability
The code calculating the bias correction on the CHELSA V2.0 precipitation data is written in Python 2.7 and
C++ (via the SAGA-GIS api). The code for the cloud cover refinement is available here: https://gitlabext.wsl.
ch/karger/chelsa_earthenv. The code for the validation is available here: https://gitlabext.wsl.ch/karger/chelsa_
earthenv_validation.
Received: 25 February 2021; Accepted: 21 October 2021;

Published: xx xx xxxx
References
1. Kucera, P. A. et al. Precipitation from Space: Advancing Earth System Science. Bull. Am. Meteorol. Soc. 94, 365–375 (2012).
2. Tapiador, F. J. et al. Global precipitation measurement: Methods, datasets and applications. Atmospheric Res. 104–105, 70–97 (2012).
3. Kirschbaum, D. B. et al. NASA’s Remotely Sensed Precipitation: A Reservoir for Applications Users. Bull. Am. Meteorol. Soc. 98,
1169–1184 (2016).
4. Sun, Q. et al. A Review of Global Precipitation Data Sets: Data Sources, Estimation, and Intercomparisons. Rev. Geophys. 56, 79–107
(2018).
5. Beck, H. E. et al. MSWEP V2 Global 3-Hourly 0.1° Precipitation: Methodology and Quantitative Assessment. Bull. Am. Meteorol.
Soc. 100, 473–500 (2019).
6. Beck, H. E. et al. Daily evaluation of 26 precipitation datasets using Stage-IV gauge-radar data for the CONUS. Hydrol. Earth Syst.
Sci. 23, 207–224 (2019).
7. Schär, C. et al. Kilometer-scale climate models: Prospects and challenges. Bull. Am. Meteorol. Soc. 101 (2019).
8. Service (C3S), C. C. C. ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate, Copernicus Climate Change
Service Climate Data Store (CDS). (2017).
9. Hersbach, H. et al. Operational global reanalysis: progress, future directions and synergies with NWP. (2018).
10. Gelaro, R. et al. The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). J. Clim. 30,
5419–5454 (2017).
11. Reichle, R. H. et al. Land surface precipitation in MERRA-2. J. Clim. 30, 1643–1664 (2017).
12. Beck, H. E. et al. MSWEP: 3-hourly 0.25◦ global gridded precipitation (1979–2015) by merging gauge, satellite, and reanalysis data.
Hydrol Earth Syst Sci Discuss 2016, 1–38 (2016).
13. Skamarock, W. C. Evaluating mesoscale NWP models using kinetic energy spectra. Mon. Weather Rev. 132, 3019–3032 (2004).
14. Ménégoz, M., Gallée, H. & Jacobi, H. W. Precipitation and snow cover in the Himalaya: from reanalysis to regional climate
simulations. Hydrol. Earth Syst. Sci. 17 (2013).
15. Liu, Z. et al. Evaluation of spatial and temporal performances of ERA-Interim precipitation and temperature in mainland China. J.
Clim. 31, 4347–4365 (2018).
16. Skamarock, C. et al. A Description of the Advanced Research WRF Model Version 4. OpenSky https://doi.org/10.5065/1dfh-6p97
(2019).
17. Dipankar, A. et al. Large eddy simulation using the general circulation model ICON. J. Adv. Model. Earth Syst. 7, 963–986 (2015).
18. Heinze, R. et al. Large-eddy simulations over Germany using ICON: a comprehensive evaluation. Q. J. R. Meteorol. Soc. 143, 69–100
(2017).
19. Fuhrer, O. et al. Near-global climate simulation at 1km resolution: establishing a performance baseline on 4888 GPUs with COSMO
5.0. Geosci. Model Dev. 11, 1665–1681 (2018).
20. Schulthess, T. C. et al. Reflecting on the goal and baseline for exascale computing: a roadmap based on weather and climate
simulations. Comput. Sci. Eng. 21, 30–41 (2018).

21. Neumann, P. et al. Assessing the scales in numerical weather and climate predictions: will exascale be the rescue? Philos. Trans. R.
Soc. Math. Phys. Eng. Sci. 377, 20180148 (2019).
22. Woodward, F. I., Fogg, G. E., Heber, U., Laws, R. M. & Franks, F. The impact of low temperatures in controlling the geographical
distribution of plants. Philos. Trans. R. Soc. Lond. B Biol. Sci. 326, 585–593 (1990).
23. Guisan, A. & Zimmermann, N. E. Predictive habitat distribution models in ecology. Ecol. Model. 135, 147–186 (2000).
24. Guisan, A. & Thuiller, W. Predicting species distribution: offering more than simple habitat models. Ecol. Lett. 8, 993–1009 (2005).
25. Tabios, G. Q. & Salas, J. D. A Comparative Analysis of Techniques for Spatial Interpolation of Precipitation1. JAWRA J. Am. Water
Resour. Assoc. 21, 365–380 (1985).
26. Daly, C., Taylor, G. H. & Gibson, W. P. The PRISM approach to mapping precipitation and temperature. Proc 10th AMS Conf Appl.
Climatol. 20–23 (1997).
27. Thornton, P. E., Running, S. W. & White, M. A. Generating surfaces of daily meteorological variables over large regions of complex
terrain. J. Hydrol. 190, 214–251 (1997).
28. Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G. & Jarvis, A. Very high resolution interpolated climate surfaces for global land
areas. Int. J. Climatol. 25, 1965–1978 (2005).
29. Briggs, P. R. & Cogley, J. G. Topographic bias in mesoscale precipitation networks. J. Clim. 9, 205–218 (1996).
30. Schneider, U. et al. GPCC’s new land surface precipitation climatology based on quality-controlled in situ data and its role in
quantifying the global water cycle. Theor. Appl. Climatol. 115, 15–40 (2013).
31. Kidd, C. et al. So, how much of the Earth’s surface is covered by rain gauges? Bull. Am. Meteorol. Soc. 98, 69–78 (2017).
32. Berndt, C. & Haberlandt, U. Spatial interpolation of climate variables in Northern Germany—Influence of temporal resolution and
network density. J. Hydrol. Reg. Stud. 15, 184–202 (2018).
33. Groisman, P. Y. & Legates, D. R. The accuracy of United States precipitation data. Bull. Am. Meteorol. Soc. 75, 215–228 (1994).
34. Sevruk, B. Regional Dependency of Precipitation-Altitude Relationship in the Swiss Alps. in Climatic Change at High Elevation Sites
(eds. Diaz, H. F., Beniston, M. & Bradley, R. S.) 123–137, https://doi.org/10.1007/978-94-015-8905-5_7 (Springer Netherlands,
1997).
35. Rasmussen, R. et al. How well are we measuring snow: The NOAA/FAA/NCAR winter precipitation test bed. Bull. Am. Meteorol.
Soc. 93, 811–829 (2012).
36. Beck, H. E. et al. Bias Correction of Global High-Resolution Precipitation Climatologies Using Streamflow Observations from 9372
Catchments. J. Clim. 33, 1299–1315 (2020).
37. Karger, D. N. et al. Climatologies at high resolution for the earth’s land surface areas. Sci. Data 4, 170122 (2017).
38. Muñoz-Sabater, J. et al. ERA5-Land: an improved version of the ERA5 reanalysis land component. in Joint ISWG and LSA-SAF
Workshop IPMA, Lisbon 26–28 (2018).
39. Karger, D. N., Schmatz, D. R., Dettling, G. & Zimmermann, N. E. High resolution monthly precipitation and temperature timeseries
for the period 2006–2100. Sci. Data (2020).
40. Huffman, G. J. et al. The TRMM Multisatellite Precipitation Analysis (TMPA): Quasi-Global, Multiyear, Combined-Sensor
Precipitation Estimates at Fine Scales. J. Hydrometeorol. 8, 38–55 (2007).
41. Biasutti, M., Yuter, S. E., Burleyson, C. D. & Sobel, A. H. Very high resolution rainfall patterns measured by TRMM precipitation
radar: seasonal and diurnal cycles. Clim. Dyn. 39, 239–258 (2011).
42. Goddard Space Flight Center Distributed Active Archive Center (GSFC DAAC). TRMM/TMPA 3B43 TRMM and Other Sources
Monthly Rainfall Product V7. (2011).
43. Funk, C. et al. The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes. Sci.
Data 2, 150066 (2015).
44. Levizzani, V., Laviola, S. & Cattani, E. Detection and measurement of snowfall from space. Remote Sens. 3, 145–166 (2011).
45. Skofronick-Jackson, G. et al. Global precipitation measurement cold season precipitation experiment (GCPEX): for measurement’s
sake, let it snow. Bull. Am. Meteorol. Soc. 96, 1719–1741 (2015).
46. Vila, D. A., D Goncalves, L. G. G., Toll, D. L. & Rozante, J. R. Statistical evaluation of combined daily gauge observations and rainfall
satellite estimates over continental South America. J. Hydrometeorol. 10, 533–543 (2009).
47. Xie, P., Yoo, S.-H., Joyce, R. & Yarosh, Y. Bias-corrected CMORPH: A 13-year analysis of high-resolution global precipitation. In
Geophysical Research Abstracts 13, EGU2011–1809 (2011).
48. Xie, P. & Xiong, A.-Y. A conceptual model for constructing high‐resolution gauge‐satellite merged precipitation analyses. J. Geophys.
Res. Atmospheres 116 (2011).
49. Vernimmen, R. R. E., Hooijer, A., Mamenun, N. K., Aldrian, E. & Van Dijk, A. Evaluation and bias correction of satellite rainfall data
for drought monitoring in Indonesia. (2012).
50. Cannon, A. J., Sobie, S. R. & Murdock, T. Q. Bias Correction of GCM Precipitation by Quantile Mapping: How Well Do Methods
Preserve Changes in Quantiles and Extremes? J. Clim. 28, 6938–6959 (2015).
51. Richards, F. & Arkin, P. On the relationship between satellite-observed cloud cover and precipitation. Mon. Weather Rev. 109,
1081–1093 (1981).
52. Arkin, P. A. & Meisner, B. N. The relationship between large-scale convective rainfall and cold cloud over the western hemisphere
during 1982-84. Mon. Weather Rev. 115, 51–74 (1987).
53. Betts, A. K., Tawfik, A. B. & Desjardins, R. L. Revisiting Hydrometeorology Using Cloud and Climate Observations. J. Hydrometeorol.
18, 939–955 (2017).
54. Wilson, A. M. & Jetz, W. Remotely Sensed High-Resolution Global Cloud Dynamics for Predicting Ecosystem and Biodiversity
Distributions. PLOS Biol 14, e1002415 (2016).
55. Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146, 1999–2049 (2020).
56. Hersbach, H. et al. Global reanalysis: goodbye ERA-Interim, hello ERA5. 17–24, https://doi.org/10.21957/vf291hehd7 (2019).
57. Cucchi, M. et al. WFDE5: bias-adjusted ERA5 reanalysis data for impact studies. Earth Syst. Sci. Data 12, 2097–2120 (2020).
58. Kållberg, P. Forecast drift in ERA-Interim. ERA Rep. Ser. 10, 9 (2011).
59. Ziese, M. et al. GPCC Full Data Daily Version.2018 at 1.0°: Daily Land-Surface Precipitation from Rain-Gauges built on GTS-based
and Historic DataZiese, Markus; Rauthe-Schöch, Armin; Becker, Andreas; Finger, Peter; Meyer-Christoffer, Anja; Schneider, Udo.
DWD 10.5676/DWD_GPCC/FD_D_V2018_100.
60. Lee, S., Wolberg, G. & Shin, S. Y. Scattered data interpolation with multilevel B-splines. IEEE Trans. Vis. Comput. Graph. 3, 228–244
(1997).
61. Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. Numerical recipes. vol. 3 (Cambridge University Press Cambridge,
1989).
62. Basist, A., Bell, G. D. & Meentemeyer, V. Statistical Relationships between Topography and Precipitation Patterns. J. Clim. 7,
1305–1315 (1994).
63. Weisse, A. K. & Bois, P. Topographic Effects on Statistical Characteristics of Heavy Rainfall and Mapping in the French Alps. J. Appl.
Meteorol. 40, 720–740 (2001).
64. Marquı́nez, J., Lastra, J. & Garcı́a, P. Estimation models for precipitation in mountainous regions: the use of GIS and multivariate
analysis. J. Hydrol. 270, 1–11 (2003).
65. Smith, R. B. & Barstad, I. A Linear Theory of Orographic Precipitation. J. Atmospheric Sci. 61, 1377–1391 (2004).
66. Jiang, Q. Precipitation over multiscale terrain. Tellus Dyn. Meteorol. Oceanogr. 59, 321–335 (2007).

67. Böhner, J. Advancements and new approaches in climate spatial prediction and environmental modelling. Arbeitsberichte Geogr.
Inst. HU Zu Berl. 109, 49–90 (2005).
68. Böhner, J. General climatic controls and topoclimatic variations in Central and High Asia. Boreas 35, 279–295 (2006).
69. Böhner, J., Antonic, O., Böhner, J. & Antonic, O. Land-Surface Parameters Specific to Topo-Climatology. In T. Hengl, & H. I. Reuter
(Eds.), GEOMORPHOMETRY: CONCEPTS, SOFTWARE, APPLICATIONS (pp. 195–226). Elsevier Science. in in T. Hengl, & H.
I. Reuter (eds.) Geomorphometry: Concepts, Software, Applications 195–226 (Elsevier Science, 2009).
70. Gerlitz, L., Conrad, O. & Böhner, J. Large-scale atmospheric forcing and topographic modification of precipitation rates over High
Asia – a neural-network-based approach. Earth Syst Dynam 6, 61–81 (2015).
71. Austin, G. L. & Dirks, K. N. Topographic Effects on Precipitation. in Encyclopedia of Hydrological Sciences https://doi.
org/10.1002/0470848944.hsa033 (American Cancer Society, 2006).
72. Liu, M., Bárdossy, A. & Zehe, E. Interaction of valleys and circulation patterns (CPs) on small-scale spatial precipitation distribution
in the complex terrain of southern Germany. Hydrol. Earth Syst. Sci. Discuss. 9 (2012).
73. Vogelezang, D. H. P. & Holtslag, A. A. M. Evaluation and model impacts of alternative boundary-layer height formulations. Bound.-
Layer Meteorol. 81, 245–269 (1996).
74. von Engeln, A. & Teixeira, J. A Planetary Boundary Layer Height Climatology Derived from ECMWF Reanalysis Data. J. Clim. 26,
6575–6590 (2013).
75. Frei, C. & Schär, C. A precipitation climatology of the Alps from high-resolution rain-gauge observations. Int. J. Climatol. 18,
873–900 (1998).
76. Roger, J. C. & Vermote, E. F. A method to retrieve the reflectivity signature at 3.75 μm from AVHRR data. Remote Sens. Environ. 64,
103–114 (1998).
77. Petitcolin, F. & Vermote, E. Land surface reflectance, emissivity and temperature from MODIS middle and thermal infrared data.
Remote Sens. Environ. 83, 112–134 (2002).
78. Kalnay, E. et al. The NCEP/NCAR 40-Year Reanalysis Project. Bull. Am. Meteorol. Soc. 77, 437–471 (1996).
79. Karger, D. N., Wilson, A. M., Mahony, C., Zimmermann, N. E. & Jetz, W. Global daily 1km land surface precipitation based on cloud
cover-informed downscaling. EarthEnv, https://doi.org/10079/MOL/6f52b80d-0a41-40f7-84ec-873458ca6ee6 (2021).
80. Menne, M. J. et al. Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. NOAA National Climatic Data Center.
10.7289/V5D21VHZ [access 3.11.2018]. (2018).
81. Gupta, H. V., Kling, H., Yilmaz, K. K. & Martinez, G. F. Decomposition of the mean squared error and NSE performance criteria:
Implications for improving hydrological modelling. J. Hydrol. 377, 80–91 (2009).
82. Kling, H., Fuchs, M. & Paulin, M. Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios. J.
Hydrol. 424–425, 264–277 (2012).
83. Knoben, W. J. M., Freer, J. E. & Woods, R. A. Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and
Kling–Gupta efficiency scores. Hydrol. Earth Syst. Sci. 23, 4323–4331 (2019).
84. Fick, S. E. & Hijmans, R. J. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 37,
4302–4315 (2017).
85. Willmott, C. J. & Robeson, S. M. Climatologically aided interpolation (CAI) of terrestrial air temperature. Int. J. Climatol. 15,
221–229 (1995).
86. Harris, I., Osborn, T. J., Jones, P. & Lister, D. Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset.
Sci. Data 7, 1–18 (2020).
87. Liu, C. et al. Continental-scale convection-permitting modeling of the current and future climate of North America. Clim. Dyn. 49,
71–95 (2017).
88. Sorooshian, S., Duan, Q. & Gupta, V. K. Calibration of rainfall-runoff models: Application of global optimization to the Sacramento
Soil Moisture Accounting Model. Water Resour. Res. 29, 1185–1194 (1993).
Acknowledgements
D.N.K. & N.E.Z. acknowledge funding from: The WSL internal grant exCHELSA, the 2019–2020 BiodivERsA
joint call for research proposals, under the BiodivClim ERA-Net COFUND program, with the funding
organisations Swiss National Science Foundation SNF (project: FeedBaCks, 193907), Agence nationale de la
recherche (ANR-20-EBI5-0001-05), the Swedish Research Council for Sustainable Development (Formas 2020–
02360), the German Research Foundation (DFG BR 1698/21–1, DFG HI 1538/16–1), and the Technology Agency
of the Czech Republic (SS70010002), as well as the Swiss Data Science Projects: SPEEDMIND, and COMECO.
D.N.K. acknowledges funding to the ERA-Net BiodivERsA - Belmont Forum, with the national funder Swiss
National Foundation (20BD21_184131), part of the 2018 Joint call BiodivERsA-Belmont Forum call (project
‘FutureWeb’), the WSL internal grant ClimEx. We thank EarthEnv project collaborators Rob Guralnick and Brian
McGill for discussions preceding and intellectually benefitting the research presented here. W.J. acknowledges
funding from NASA grants 80NSSC17K0282, 80NSSC20K0202, and 80NSSC18K0435.
Author contributions
D.N.K., A.W. and W.J. developed the idea. A.W. produced the monthly MODIS cloud frequency layers, D.N.K.
and N.E.Z. developed and implemented the precipitation downscaling and bias correction algorithm, C.M. and
D.N.K. conducted the validation, D.N.K. wrote the first version of the manuscript and all authors contributed
significantly to the revision.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information The online version contains supplementary material available at https://doi.
org/10.1038/s41597-021-01084-6.
Correspondence and requests for materials should be addressed to D.N.K. or W.J.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre-
ative Commons license, and indicate if changes were made. The images or other third party material in this
article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons license and your intended use is not per-
mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/

applies to the metadata files associated with this article.
© The Author(s) 2021

Global Daily 1 KM Land Surface Precipitation Based On Cloud Cover-Informed Downscaling

Uploaded by

Copyright:

Available Formats

Global Daily 1 KM Land Surface Precipitation Based On Cloud Cover-Informed Downscaling

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Global Daily 1 KM Land Surface Precipitation Based On Cloud Cover-Informed Downscaling

Uploaded by

Copyright:

Available Formats

www.nature.

OPEN Global daily 1 km land surface

Background & Summary

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 1

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 2

pmcor = psim ∗ R mint

H = HW , L → dLHi < 0 ∗ HW , L → dLHi ≥ 0

PBLz = PBL + z ERA + f

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 3

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 4

Table 1. Grid extent and resolution of the GeoTIFF files.

Native spatial temporal

GeoTIFF can be viewed using standard GIS software such as:

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 5

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 6

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 7

Resolution rmse mae pbias KGE Person’s r β γ

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 8

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 9

entire extent > 104°W <= 104°W

<0 1000 2000 3000 >4000

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 10

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 11

Downscaling performance in relation to comparable products. The bias compared to observa-

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 12

(c) CHELSA−EarthEnv (d)

(e) PRISM (f)

(g) WRF (h)

(i) WorldClim2 (j)

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 13

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 14

Received: 25 February 2021; Accepted: 21 October 2021;

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 15

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 16

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 17

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/

© The Author(s) 2021

Scientific Data | (2021) 8:307 | https://doi.org/10.1038/s41597-021-01084-6 18

You might also like