Ijgi 07 00129 v3
Ijgi 07 00129 v3
Ijgi 07 00129 v3
Chair of Remote Sensing Technology, TUM Department of Civil, Geo and Environmental Engineering, Technical
University of Munich, Arcisstraße 21, 80333 Munich, Germany; marco.koerner@tum.de
* Correspondence: marc.russwurm@tum.de; Tel.: +49-172-81-70-121
Abstract: Earth observation (EO) sensors deliver data at daily or weekly intervals. Most land
use and land cover classification (LULC) approaches, however, are designed for cloud-free and
mono-temporal observations. The increasing temporal capabilities of today’s sensors enable the
use of temporal, along with spectral and spatial features.Domains such as speech recognition or
neural machine translation, work with inherently temporal data and, today, achieve impressive
results by using sequential encoder-decoder structures. Inspired by these sequence-to-sequence
models, we adapt an encoder structure with convolutional recurrent layers in order to approximate a
phenological model for vegetation classes based on a temporal sequence of Sentinel 2 (S2) images.
In our experiments, we visualize internal activations over a sequence of cloudy and non-cloudy
images and find several recurrent cells that reduce the input activity for cloudy observations.
Hence, we assume that our network has learned cloud-filtering schemes solely from input data,
which could alleviate the need for tedious cloud-filtering as a preprocessing step for many EO
approaches. Moreover, using unfiltered temporal series of top-of-atmosphere (TOA) reflectance data,
our experiments achieved state-of-the-art classification accuracies on a large number of crop classes
with minimal preprocessing, compared to other classification approaches.
Keywords: deep learning; multi-temporal classification; land use and land cover classification;
recurrent networks; sequence encoder; crop classification; sequence-to-sequence; Sentinel 2
1. Introduction
Land use and land cover classification (LULC) has been a central focus of Earth observation
(EO) since the first air- and space-borne sensors began to provide data. For this purpose, optical
sensors sample the spectral reflectivity of objects on the Earth’s surface in a spatial grid at repeated
intervals. Hence, LULC classes can be characterized by spectral, spatial and temporal features. Today,
most classification tasks focus on spatial and spectral features [1], while utilizing the temporal domain
had long proven challenging. This is mostly due to limitations on data availability, the cost of data
acquisition, infrastructural challenges regarding data storage and processing and the complexity of
model design and feature extraction over multiple time frames.
Some LULC classes, such as urban structures, are mostly invariant to temporal
changes and, hence, are suitable for mono-temporal approaches. Others, predominantly
vegetation-related classes, change their spectral reflectivity based on biochemical processes initiated
by phenological events related to the type of vegetation and to environmental conditions.
These vegetation-characteristic phenological transitions have been utilized for crop yield prediction
and, to some extent, for classification [2,3]. However, to circumvent the previously-mentioned
challenges, the dimensionality of spectral bands has often been compressed by calculating task-specific
indices, such as the normalized difference vegetation index (NDVI), the normalized difference water
index (NDMI) or the enhanced vegetation index (EVI).
Today, most of these temporal data limitations have been alleviated by technological advances.
Reasonable spatial and temporal resolution data of multi-spectral Earth observation sensors are
available at no cost. Moreover, new services inexpensively provide high temporal and spatial resolution
imagery. The cost of data storage has decreased, and data transmission has become sufficiently fast
to allow gathering and processing all available images over a large area and multiple years. Finally,
new advances in machine learning, accompanied by GPU-accelerated hardware, have made it possible
to learn complex functional relationships, solely from the data provided.
Since now data are available at high resolutions and processing is feasible, the temporal domain
should be exploited for EO approaches. However, this exploitation requires suitable processing
techniques utilizing all available temporal information at reasonable complexity. Other domains,
such as machine translation [4], text summarization [5–7] or speech recognition [8,9], handle sequential
data naturally. These domains have popularized sequence-to-sequence learning, which transforms
a variable-length input sequence to an intermediate representation. This representation is then
decoded to a variable-length output sequence. From this concept, we adopt the sequential encoder
structure and extract characteristic temporal features from a sequence of Sentinel 2 (S2) images using a
straightforward, two-layer network.
Thus, the main contributions of this work are:
(i) the adaptation of sequence encoders from the field of sequence-to-sequence learning to Earth
observation (EO),
(ii) a visualization of internal gate activations on a sequence of satellite observations and,
(iii) the application of crop classification over two seasons.
2. Related Work
As we aim to apply our network to vegetation classes, we first introduce common crop
classification approaches, to which we will compare our results in Section 6. Then, we motivate
data-driven learning models and cover the latest work on recurrent network structures in the
EO domain.
Many remote sensing approaches have achieved adequate classification accuracies for multi-temporal
crop data by using multiple preprocessing steps in order to improve feature separability. Common methods
are atmospheric correction [10–14], calculation of vegetation indices [10–14] or the extraction of
sophisticated phenological features [13]. Additionally, some approaches utilize expert knowledge,
for instance, by introducing additional agro-meteorological data [10], by selecting suitable observation
dates for the target crop-classes [14] or by determining rules for classification [11]. Pixel-based [10,13]
and object-based [11,12,14] approaches have been proposed. Commonly, decision trees (DTs) [10,11,14]
or random forests (RFs) [12,13] are used as classifiers, the rules of which are sometimes aided by
additional expert knowledge [11].
These traditional approaches generally trade procedural complexity and the use of region-specific
expert knowledge for good classification accuracies in the respective areas of interest (AOIs). However,
these approaches are, in general, difficult to apply to other regions. Furthermore, the processing
structure requires supervision to varying degrees (e.g., product selection, visual image inspection,
parameter tuning), which impedes application at larger scales.
Today, we are experiencing a change in paradigm: away from the design of
physically-interpretable, human-understandable models, which require task-specific expert knowledge,
towards data-driven models, which are encoded in internal weight parameters and derived solely
from observations. In that regard, hidden Markov models (HMMs) [15] and conditional random
fields (CRFs) [16] have shown promising classification accuracies with multi-temporal data. However,
the underlying Markov property limits long-term learning capabilities, as Markov-based approaches
assume that the present state only depends on the current input and one previous state.
Deep learning methods have had major success in fields, such as target recognition and scene
understanding [17], and are increasingly adopted by the remote sensing community. These methods
ISPRS Int. J. Geo-Inf. 2018, 7, 129 3 of 18
have proven particularly beneficial for modeling physical relationships that are complicated, cannot be
generalized or are not well-understood [18]. Thus, deep learning is potentially well suited to
approximate models of phenological changes, which depend on complex internal biochemical
processes of which only the change of surface reflectivity can be observed by EO sensors. A purely
data-driven approach might alleviate the need to manually design a functional model for this complex
relationship. However, caution is required, as external and non class-relevant factors, such as seasonal
weather or observation configurations, are potentially incorporated into the model, which might
remain undetected if these factors constantly bias the dataset.
In remote sensing, convolutional networks have gained increasing popularity for mono-temporal
observation tasks [19–22]. However, for sequential tasks, recurrent network architectures,
which provide an iterative framework to process sequential information, are generally better suited.
Recent approaches utilize recurrent architectures for change detection [23–25], identification of sea level
anomalies [26] and land cover classification [27]. For long-term dependencies, Jia et al. [24] proposed a
new cell architecture, which maintains two separate cell states for single- and multi-seasonal long-term
dependencies. However, the calculation of an additional cell state requires more weights, which may
prolong training and require more training samples.
In previous work, we have experimented with recurrent networks for crop classification [28]
and achieved promising results. Based on this, we propose a network structure using convolutional
recurrent layers and the aforementioned adaptation of a many-to-one classification scheme with
sequence encoders.
3. Methodology
Section 3.1 incrementally introduces the concepts of artificial neural networks (ANNs),
feed-forward networks (FNNs) and recurrent neural networks (RNNs) and illustrates the use of
RNNs in sequence-to-sequence learning. We then describe the details of the proposed network
structure in Section 3.3.
multiplication leads to vanishing and exploding gradients [32,33]. While exploding gradients can
be avoided with gradient clipping, vanishing gradients impede the extraction of long-term feature
relationships. This issue has been addressed by Hochreiter and Schmidhuber [34], who introduced
additional gates and an internal state vector ct in long short-term memory (LSTM) cells to control
the gradient propagation through time and to enable long-term learning, respectively. Analogous to
standard RNNs, the output gate ot balances the influence of the previous cell output ht−1 and the
current input xt . At LSTMs, the cell output ht is further augmented by an internal state vector ct ,
which is designed to contain long-term information. To avoid the aforementioned vanishing gradients,
reading and writing to the cell state is controlled by three additional gates. The forget gate f t decreases
previously-stored information by element-wise multiplication ct−1 f t . New information is added
by the product of input gate it and modulation gate jt . Illustrations of the internal calculation can be
seen in Figure 1, and the mathematical relations are shown in Table 1. Besides LSTMs, gated recurrent
units (GRUs) [35] have gained increasing popularity, as these cells achieve similar accuracies to LSTMs
with fewer trainable parameters. Instead of separate vectors for long- and short-term memory, GRUs
Version 13th March, 2018 submitted to ISPRS Int. J. Geo-Inf. 4 of 19
formulate a single, but more sophisticated, output vector.
xt xt
h t −1 ht h t −1 ht
f t it jt ot rt ut h̃t
a a
[ akb] a
c t −1 + ct + b a
concat copy
Figure 1. Schematic illustration of long short-term memory (LSTM) and gated recurrent unit (GRU)
Figure 1. Schematic
cells analogous to the cellillustration of long
definitions short-term
in Table 1. Thememory (LSTM)
cell output ht isand gated recurrent
calculated unit (GRU)
via internal gates and,
cells analog to the cell definitions in Table 1. The cell output ht is calculated via internal gates and based
based on the current input xt , combined with prior context information ht−1 , ct−1 . This is realized by a
on the current input xt combined with prior context information ht−1 , ct−1 . LSTM cells are designed to
concatenation (concat.) of these tensors, as illustrated by merging arrows. LSTM cells are designed to
separately accommodate long-term context in the internal cell state ct−1 , from short-term context ht−1 .
separately accommodate long-term context in the internal cell state ct−1 , from
GRU cells combine all context information in a single, but more sophisticated
short-term context ht−1 .
output ht−1 .
GRU cells combine all context information in a single, but more sophisticated output ht−1 .
134 been addressed by Hochreiter and Schmidhuber [29], who introduced additional gates and an internal
The common output of recurrent layers provides a many-to-many relation by generating an output
135 state vector ct in long short-term memory (LSTM) cells to control the gradient propagation through time
vector
136
at each observation ht given respectively.
and to enable long-term learning,
previous context Analog tto
and ct−RNNs,
h −1standard 1 , as shown in Figure 2a. However,
the output gate ot balances
137 the influence of the previous cell output ht−1 and the current input xt . At LSTMs, the cell output ht is
138 further augmented by an internal state vector ct , which is designed to contain long-term information.
139 To avoid aforementioned vanishing gradients, reading and writing to the cell state is controlled by
140 three additional gates. The forget gate f t decreases previously stored information by element-wise
141 multiplication ct−1 f t . New information is added by the product of input gate it and modulation gate
ISPRS Int. J. Geo-Inf. 2018, 7, 129 5 of 18
encoding information of the entire sequence in a many-to-one relation is favored in many applications.
Following this idea, sequence-to-sequence learning, illustrated in Figure 2b, has popularized the use
of the cell state vector c T at the last-processed observation T as a representation of the entire input
sequence. These encoding-decoding networks transform an input sequence of varying length to an
intermediate state representation c of fixed size. Subsequently, the decoder generates a varying length
output sequence from this intermediate representation. Further developments in this domain include
attention schemes. These provide additional intermediate connections between encoder and decoder
layers, which are beneficial for translations of longer sequences [4].
In many sequential applications, the common input form is xt ∈ Rd with a given depth d.
The output vectors ht ∈ Rr are computed by matrix multiplication with internal weights W ∈ R(r+d)×r
and r recurrent cells. However, other fields, such as image processing, commonly handle raster
data xt ∈ Rh×w×d of specific width w, height h and spectral depth d. To account for neighborhood
relationships and to circumvent the increasing complexity, convolutional variants of LSTMs [36]
and GRUs have been introduced. These variants convolve the input tensors with weights W ∈
R k ×k ×(r +d)×r augmented by the convolutional kernel size k, which is a hyper-parameter determining
Version 13th March, 2018 submitted to ISPRS Int. J. Geo-Inf. 5 of 19
the perceptive field.
x1 x2 xT I live in Munich
...
0
intermediate representation c
0
...
y1 y2 yT
Ich lebe in München
(a) in previous work [28].
(a) Network structure employed (b)
(b) Illustration of a sequence-to-sequence network [8] as often
used in neural translation tasks.
Figure 2. Illustrations of recurrent network architectures that inspired this work. The network of
Illustrations
Figure 2. work
previous of recurrent
[28] shown network
in (a) creates architectures
a prediction which
yt at each inspired this
observation work.onThe
t based network
spectral of
input
previous work
information xt [28]
and shown in Figure
the previous 2(a) creates
context ht−1 , cta−prediction
1 . y at each
Sequence-to-sequence
t observation
networks, t based
as on
shownspectral
in (b),
input information
aggregate xt and
sequential the previous
information context
to an ht−1 , ct−state
intermediate 1 . Sequence-to-sequence networks,of
c T , which is a representation asthe
shown in
entire
Figure 2(b), aggregate sequential information to an intermediate
series. (a) Network structure employed in previous work [28]; (b) illustration state c T which is a representation
of a sequence-to-sequence of
the entire[8]
network series.
as often used in neural translation tasks.
a d-dimensional input vector. This vector included the concatenated bottom-of-atmosphere (BOA)
reflectances of nine pixels neighboring one point-of-interest. The point-wise classification was sufficient
for quantitative accuracy evaluation, but could not produce areal classification maps. Since a class
prediction was performed on every observation, we introduced additional covered classes for cloudy
pixels at single images. These were derived from the scene classification of the SEN 2 COR atmospheric
correction algorithm, which required additional preprocessing. A single representative classification
for the entire time-series would have required additional post-processing to further aggregate the
predicted labels for each observation. Finally, the mono-directional iterative processing introduced a
bias towards last observations. With more contextual information available, later observation showed
better classification accuracies compared to observations earlier in the sequence.
linear unit (ReLU) [37] or leaky ReLU [38] non-linear activation function. At each training step, the
cross-entropy loss
between the predicted activations ŷ and an one-hot representation of the ground truth labels y evaluates
the prediction quality.
Tunable hyper-parameters are the number of recurrent cells r and the sizes of the convolutional
Version 13th March, 2018 submitted to ISPRS Int. J. Geo-Inf. 7 of 19
kernel krnn and the classification kernel kclass .
x0 x1 in sequence xT label
...
seq
h0 =0
H (y, ŷ)
seq seq
c0 =0 cT
crev
T =0 crev
0
xT x T −1 reversed x0 prediction
Figure 3. Schematic illustration of our proposed bidirectional sequential encoder network. The input
Figure 3. Schematicial illustration of our proposed bidirectional sequential encoder network. Theseq input
sequence x ∈ { x0 , . . . , x T } of observations xt ∈ Rhh××ww××dd is encoded to a representation c T =seq [c kcinv ].
sequence x ∈ { x0 , . . . , x T } of observations xt ∈ R is encoded to a representation c T = [c T kTcinv ]0.
0 bias
The observations are passed in sequence (seq) and reversed (rev) order to the encoder to eliminate
The observations are passed in sequence (seq) and reversed (rev) order to the encoder to eliminate bias
towards recent observations. The concatenated representation of both passes c T is then projected to
towards recent observations. The concatenated representation of both passes c T is then projected to
softmax-normalized feature maps for each class using a convolutional layer.
softmax-normalized feature maps for each class using a convolutional layer.
4. Dataset
226 c T to softmax-normalized activation maps ŷ for n classes: c T ∈ Rh×w×2r 7→ ŷ ∈ Rh×w×n . This layer is
227
For the of
composed evaluation of our
a convolution approach,
with we defined
a kernel size a large area
of kclass , followed of interest
by batch (AOI) ofand
normalization 102akm 42 km
× relu
leaky
228
north
[37] of
or Munich, Germany.activation
relu [38] non-linear An overview of the AOI at multiple scales is shown in Figure 4. The AOI
function.
was further
At each training step, the cross-entropy lossof 3.84 km × 3.84 km (multiples of 240 m and 480 m)
subdivided into squared blocks
to ensure dataset independence while maintaining similar class distributions. These blocks were
then randomly assigned to partitions H y) = − ∑
(ŷ,network
for ŷi log(yi )hyper-parameter validation and model
training, (1)
i
evaluation in a ratio of 4:1:1 similar to previous work [28]. The spatial extent of single samples x is
229 determined
between the bypredicted
tile-gridsactivations
of 240 m and
ŷ and480
anm. We bilinearly
one-hot interpolated
representation the 20truth
of the ground m and 60 m
labels S2 bands to
y evaluates
23010the
m ground sampling
prediction distance
quality. (GSD) to based
Consequently, harmonizeon thistheloss
raster data dimensions.
function and the Adam To provide additional
optimizer [39],
231 temporal
gradients meta information, the year and day-of-year of the individual observations were added as
are back-propagated through the network layers and adjust the model weights.
232 matricesTunable
to the hyper-parameters
input tensor. Hence, are the
the number of recurrent
input feature depthcells
d = r15and the sizes ofof
is composed thefour
convolutional
10 m (B4, B3,
kernel
233 B2, B8), ksix and
rnn 20 mthe classification
(B5, B6, B7, B8A) kernel
andkclass
three. 60 m (B1, B11, B12) bands combined with year and
day-of-year.
234 4. Dataset
With ground truth labels of two growing seasons 2016 and 2017 available, we gathered 274
235(108 inFor the166
2016; evaluation
in 2017)of our approach,
Sentinel we defined
2 products at 98 (46 a large area52ofininterest
in 2017; (AOI) of 102 km
2017) observation × 42
dates km
between
north of 2016
236 3 January Munich, Germany. An overview of the AOI at multiple scales is shown
and 15 November 2017. The obtained time series represents all available S2 products in Figure 4. The
237 AOI was
labeled with further
cloudsubdivided
coverage lessintothan
squared
80%.blocks
In someof 3.84
S2 km × 3.84
images, wekm (multiples
noticed of 240
a spatial m and
offset in 480
them)
scale
238 to ensure dataset independence while maintaining similar class distributions.
of one pixel. However, we did not perform additional georeferencing and treated the spatial offset These blocks were
239 then randomly assigned to partitions for network training, hyper-parameter validation, and model
240 evaluation in a ratio of 4 : 1 : 1 similar to previous work [28]. The spatial extent of single samples
241 x are determined by tile-grids of 240 m and 480 m. We bilinearly interpolated the 20 m and 60 m S2
242 bands to 10 m ground sampling distance (GSD) to harmonize the raster data dimensions. To provide
243 additional temporal meta information, the year and day-of-year of the individual observations were
244 added as matrices to the input tensor. Hence, the input feature depth d = 15 is composed of four 10 m
Berlin training validation evaluation
40 km
as data-inherent
Version observation
13th March, 2018 submitted tonoise. Overall,
ISPRS Int. we relied on the geometrical and spectralmargin
J. Geo-Inf. reference
(480 m)
8 ofas
19
provided by the C OPERNICUS ground segment. 102 km
3840 m blocks (14 240 m tiles, 480 m margin)
Munich
tile (240 m)
Figure 4. Area of interest (AOI) north of Munich containing 430 kha and 137 k field parcels. The AOI is
further tiled at multiple scales into datasets for training, validation and evaluation and footprints of
Berlin training validation evaluation
individual samples.
40 km
247 With ground truth labels of two growing seasons 2016 and 2017 available, we gathered 274 (108
248 in 2016; 166 in 2017) Sentinel 2 products at 98 (46 in 2017; 52 in 2017) observation dates between 3rd
249 January, 2016 and 15th November, 2017. The obtained time series represents all available S2 products margin (480 m)
250 labeled with cloud coverage less than 80%. In some S2 images, 102 km
3840 mwe
blocksnoticed
(14 240 m tiles,a
480spatial
m margin) offset in the scale
257
10,000 2016 2017
258 With modern agriculture, centered around fewS2B
the AOI.3,000 predominant crops, the distribution of classes
1,000
259 is not uniform,
300
as can be observed from Figure 5(a). This
S2Anon-uniform class distribution is generally
260 not optimal for the classification evaluation
e at w y e d y p le at ye et lt us s as s as it skews the overall
01 04
accuracy
07 10
metric
02
towards
05 08
classes
12
of
aiz heado arloetatoesee arle hotica o r r be sperag beanpe bean
261 high frequency.
m w e Hence,
b
m ter p rap er
we
b additionally
tri ga asp calculated
a y kappa metrics [40] for the 2017
quantitative evaluation
m su so 2016
in Section 5.2wto in compensate
m for unbalanced distributions.
262
su crop classes
30,000
field parcels
5. Results
10,000 2016 2017
S2B
3,000
In1,000
this section, we first visualize internal state activations in Section 5.1 to gain a visual
300 S2A
understanding of the sequential encoding process. Further findings on internal cloud masking are
t
e d y p le a ry et eon
e at w ey classification e t s s
s a ns
presented before
aiz e o the
rl ato ee rle ho ca o results e l gu cropan e classes
a 01 04 07 and
are quantitatively 10 qualitatively
02 05 evaluated
08 12
m wheadr bapot pesr ba triti a r b sppara be p ybe
in Sections 5.2 a
m teand 5.3.
r me g a s s o 2016 2017
n su
wi su
m
crop classes
5.1. Internal Network Activations
(a) Non-uniform distribution of field classes in the AOI (b) Acquired Sentinel 2 (S2) observations of the twin
satellites S2A and
In Section 3.1, we gave an overview of the functionality S2B
of recurrent layers and discussed
the property of LSTM state vectors ct ∈ R h × w × r to encode sequential information over a series of
Figure 5. Information of the area of interest containing location, division schemes, class distributions
and dates of acquired satellite imagery.
ISPRS Int. J. Geo-Inf. 2018, 7, 129 9 of 18
observations. The cell state is updated by internal gates it , jt , f t ∈ Rh×w×r , which in turn are calculated
based on previous cell output ht−1 and cell state ct−1 (see Table 1). To assess prior assumptions
regarding cloud filtering and to visually assess the encoding process, we visualized internal LSTM
cell tensors for a sequence of images and show representative activations of three cells in Figure 6.
The LSTM network, from which these activations are extracted, was trained on 24 px × 24 px tiles with
r = 256 recurrent cells and krnn = kclass = 3 px. Additionally, we inferred the network with tiles of
height h and width w of 48 px. Experiments with the input size of 24 px show similar results and are
included in the Supplementary Material to this work. In the first row, a 4σ band-normalized RGB image
represents the input satellite image xt ∈ Rh=48 × Rw=48 × Rd=15 at each time frame t. The next rows
show the activations of input gate iti , modulation gate jti , forget gate f ti and cell state cit at three selected
recurrent cells, which are denoted by the raised index i ∈ {3, 22, 47}. After iteratively processing
the sequence, the final cell state c T =36 is used to produce activations for each class, as described in
Section 3.3.
In the encoding process, the detail of structures at the cell state tensor increased gradually.
This may be interpreted as additional information written to the cell state. It further appeared that the
structures visible at the cell states resembled shapes, which were present in cloud-free RGB images
(3) (22)
(e.g., ct=15 or ct=28 ). Some cells (e.g., Cell 3 or Cell 22) changed their activations gradually over
the span of multiple observations, while others (e.g., 48) changed more frequently. Forget gate f
activations are element-wise multiplied with the previous cell state ct−1 and range between zero and
one. Low values in this gate numerically reduce the cell state, which can be potentially interpreted
as a change of decision. The input i and modulation gate j control the degree of new information
written to the cell state. While the input gate is scaled between zero and one, the modulation gate
j ∈ [−1, 1] determines the sign of change. In general, we found the activity of a majority of cells (e.g.,
Cell 3 or Cell 22) difficult to associate with distinct events in the current input. However, we assumed
that classification-relevant features were expressed as a combination of cell activations similar to other
neural network approaches. Nevertheless, we could identify a proportionally small number of cells,
in which the shape of clouds visible in the image was projected on the internal state activations. One of
these was cell i = 47. For cloudy observations, the input gate approached zero either over the entire
tile (e.g., t = {10, 18, 19, 36}) or over patches of cloudy pixels (e.g., t = {11, 13, 31, 33}). At some
(47)
observation times (e.g., t = {13, 31, 32}), the modulation gate jt additionally changed the sign.
In a similar fashion, Karpathy [41] evaluated cell activations for the task of text processing.
He could associate a small number of cells with a set of distinct tasks, such as monitoring the lengths
of a sentence or maintaining a state-flag for text inside and outside of brackets.
Summarizing this experiment, the majority of cells showed increasingly detailed structures when
new information was provided in the input sequence. It is likely that the grammar of crop-characteristic
phenological changes was encoded in the network weights, and we suspect that a certain amount
of these cells was sensitive to distinct events relevant for crop identification. However, these events
may be encoded in multiple cells and were difficult to visually interpret. A small set of cells could
be visually associated with individual cloud covers and may be used for internal cloud masking.
Based on these findings, we are confident that our network has learned to internally filter clouds
without explicitly introducing cloud-related labels.
ISPRS Int. J. Geo-Inf. 2018, 7, 129 10 of 18
x ... ...
1
InternalLSTM
Figure6.6.Internal
Figure LSTMcellcellactivations
activations of
of input gate ii((ii)),, forget
input gate forget gate gate ff ((ii)) ,, modulation
modulation gate
(i )
gate jj(i) and
and cell
cell state
(i )
state cc(i) at
atthree
three(of =256)
(ofrr= 256)selected
selectedcells
cellsi i∈∈{{3,3,22,
22,47
47}}given
given
thecurrent
the currentinput
inputxxt tover
overthe
thesequence
sequenceof ofobservations
observationstt = 1, ...,. ,36
= {{1,. 36}}. .The
Thedetail detailof
offeatures
featuresatatthethecell
cellstates
statesincreased
increasedgradually,
gradually,which
whichindicated
indicatedthe theaggregation
aggregationofof
informationover
information overthe
thesequence.
sequence. While
While most
most cells
cells likely
likely contribute
contribute to to the
the classification
classification decision,
decision, onlyonly some
some cells
cells are
are visually
visuallyinterpretable
interpretablewith
withregard
regardtotothe thecurrent
current
input xt .t One visually-interpretable cell i = 47 has learned to identify cloud, as input and modulation gates show different activation patterns on cloudyand
input x . One visually interpretable cell i = 47 has learned to identify cloud, as input and modulation gates show different activation patterns on cloudy and
10 of 19
non-cloudyobservations.
non-cloudy observations.
ISPRS Int. J. Geo-Inf. 2018, 7, 129 11 of 18
correctly-classified samples with values equivalent to Table 2. Structures outside the diagonal indicate
systematic confusions between classes and may give insight into the reasoning behind varying
classification accuracies.
spelt
triticale 60 %
beans
peas
potato
soybeans 40 %
asparagus
wheat
winter barley 20 %
rye
summer barley
maize
0%
winter barley
summer barley
winter barley
summer barley
meadow
hop
triticale
beans
peas
potato
soybeans
asparagus
rye
maize
meadow
hop
triticale
beans
peas
potato
soybeans
asparagus
rye
maize
sugar beet
oat
spelt
wheat
sugar beet
oat
spelt
wheat
rapeseed
rapeseed
Figure 7. Confusion matrix of the trained convolutional GRU network on data of the seasons 2016 and
2017. While the confusion of some classes was consistent over both seasons (e.g., winter triticale to
winter wheat), other classes are classified at different accuracies for consecutive years (e.g., winter barley
to winter spelt).
Table 2. Pixel-wise accuracies of the trained convolutional GRU sequential encoder network after
training over 60 epochs on data of both growth seasons. The conditional kappa metrics [42] for each
class and the overall kappa [40] measure are given for both growth seasons. The best and worst metrics
are emphasized by boldface.
Year
2016 2017
Class
Precision Recall f -Meas. Kappa # of Pixels Precision Recall f -Meas. Kappa # of Pixels
(User’s Acc.) (Prod.Acc.) (User’s Acc.) (Prod. Acc.)
sugar beet 94.6 77.6 85.3 0.772 59 k 89.2 78.5 83.5 0.779 94 k
oat 86.1 67.8 75.8 0.675 36 k 63.8 62.8 63.3 0.623 38 k
meadow 90.8 85.7 88.2 0.845 233 k 88.1 85.0 86.5 0.837 242 k
rapeseed 95.4 90.0 92.6 0.896 125 k 96.2 95.9 96.1 0.957 114k
hop 96.4 87.5 91.7 0.873 51 k 92.5 74.7 82.7 0.743 53 k
spelt 55.1 81.1 65.6 0.807 38 k 75.3 46.7 57.6 0.463 31 k
triticale 69.4 55.7 61.8 0.549 65 k 62.4 57.2 59.7 0.563 64 k
beans 92.4 87.1 89.6 0.869 27 k 92.8 63.2 75.2 0.630 28 k
peas 93.2 70.7 80.4 0.706 9k 60.9 41.5 49.3 0.414 6k
potato 90.9 88.2 89.5 0.876 126 k 95.2 73.8 83.1 0.728 140 k
soybeans 97.7 79.6 87.7 0.795 21 k 75.9 79.9 77.8 0.798 26 k
asparagus 89.2 78.8 83.7 0.787 20 k 81.6 77.5 79.5 0.773 19 k
wheat 87.7 93.1 90.3 0.902 806 k 90.1 95.0 92.5 0.930 783 k
winter barley 95.2 87.3 91.0 0.861 258 k 92.5 92.2 92.4 0.915 255 k
rye 85.6 47.0 60.7 0.466 43 k 76.7 61.9 68.5 0.616 30 k
summer barley 87.5 83.4 85.4 0.830 73 k 77.9 88.5 82.9 0.880 91 k
maize 91.6 96.3 93.9 0.944 919 k 92.3 96.8 94.5 0.953 876 k
weight.avg 89.9 89.7 89.5 89.5 89.5 89.3
Overall Accuracy Overall Kappa Overall Accuracy Overall Kappa
Some crops likely share common spectral or phenological characteristics. Hence, we expected
some symmetric confusion between classes, which would be expressed as diagonal symmetric
confusions consistent in both years. Examples of this were triticale and rye or oat and summer
ISPRS Int. J. Geo-Inf. 2018, 7, 129 13 of 18
barley. However, these relations were not frequent in the dataset, which indicates that the network
had sufficient capacity to separate the classes by provided features. In some cases, one class may
share characteristics with another class. This class may be further distinguished by additional unique
features, which would be expressed by asymmetric confusions between these two classes in both
seasons. Relations of this type were more dominantly visible in the matrices and included confusions
between barley and triticale, triticale and spelt or wheat confused with triticale and spelt. These types
of confusion were consistent over both seasons and may be explained by a spectral or phenological
similarity between individual crop-types.
More dominantly, many confusions were not consistent over the two growing seasons.
For instance, confusions occurring only in the 2017 season were soybeans with potato or peas with
meadow and potato. Since the cultivated crops are identical in these years and the class distributions
were consistent, seasonally-variable factors were likely responsible for these relations. As reported
in Table 2, peas have been classified well in 2016, but poorly in 2017, due to the aforementioned
confusions with meadow and potato. These results indicate that external and not crop-type-related
factors had a negative influence on classification accuracies, which appeared unique to one season.
One of these might be the variable onset of phenological events, which are indirectly observed by the
change of reflectances by the sensors. These events are influenced by local weather and sun exposure,
which may vary over large regional scales or multiple years.
for this region, fewer satellite images were available. The lack of temporal information likely explains
the poor classification accuracies. However, this example illustrates that the class activations give an
indication of the classification confidence independent of the ground truth information.
x RGB,t labels y pred. ŷ loss H (y, ŷ) activation activation activation activation
1
A maize meadow peas rape
0
1
B spelt wheat s. barley maize
0
1
C meadow wheat oat maize
0
1
D meadow wheat potato maize
0
1
E rye wheat triticale s. barley
0
1
F wheat meadow maize w.barley
asparag. bean hop maize meadow peas potato rape soybean beet s. barley oat w. barley rye spelt triticale wheat
Figure 8. Qualitative results of the convolutional GRU sequential encoder. Examples (A–D) show good
classification results. For Example (E) the network misclassified one maize parcel with high confidence,
which is indicated by incorrect, but well-defined activations. In a second field, the class activations
reveal a confusion between wheat, meadow and maize. For Example (F), most pixels are misclassified.
However, the class activations show uncertainty in the classification decision.
6. Discussion
In this section, we compare our approach with other multi-temporal classifications. Unfortunately,
to the best of our knowledge, no multi-temporal benchmark dataset is available to compare remote
sensing approaches on equal footing. Nevertheless, we provide some perspective of the study domain
by gathering multi-temporal crop classification approaches in Table 3 and categorizing these by their
applied methodology and achieved overall accuracy. However, the heterogeneity of data sources,
the varying extents of their evaluated areas and the number of classes used in these studies impedes a
numerical comparison of the achieved accuracies. Despite this, we hope that this table will provide an
overview of the state-of-the-art in multi-temporal crop identification.
Earth observation (EO) data are acquired in periodic intervals at high spatial resolutions. From an
information theoretical perspective, utilizing additional data should lead to better classification
performance. However, the large quantity of data requires methods that are able to process this
information and are robust with regard to observation noise. Optimally, these approaches are scalable with
minimal supervision so that data of multiple years can be included over large regions. Existing approaches
in multi-temporal EO tasks often use multiple separate processing steps, such as preprocessing, feature
ISPRS Int. J. Geo-Inf. 2018, 7, 129 15 of 18
extraction and classification, as summarized by Ünsalan and Boyer [44]. Generally, these steps require
manual supervision or the selection of additional parameters based on region-specific expert knowledge,
a process that impedes applicability at large scales. The cost of data acquisition is an additional barrier,
as multiple and potentially expensive satellite images are required. Commercial satellites, such as
RapidEye (RE), Satellite Pour l’Observation de la Terre (SPOT) or QuickBird (QB), provide images
at excellent spatial resolution. However, predominantly inexpensive sensors, such as Landsat (LS),
Sentinel 2 (S2), Moderate-resolution Imaging Spectroradiometer (MODIS) or Advanced Spaceborne
Thermal Emission and Reflection Radiometer (ASTER), can be applied at large scales, since the
decreasing information gain of additional observations must justify image acquisition costs. Many
approaches use spectral indices, such as normalized difference vegetation index (NDVI), normalized
difference water index (NDWI) or enhanced vegetation index (EVI), to extract statistical features
from vegetation-related signals and are invariant to atmospheric perturbations. Commonly, decision
trees (DTs) or random forests (RFs) are used for classification. The exclusive use of spectral indices
simplifies the task of feature extraction. However, these indices utilize only a small number of
available spectral bands (predominantly blue, red and near-infrared). Thus, methods that utilize
all reflectance measurements, either at top-of-atmosphere (TOA), or atmospherically-corrected to
bottom-of-atmosphere (BOA), are favorable, since all potential spectral information can be extracted.
Approach Details
Sensor Preprocessing Features Classifier Accuracy # of Classes
Rußwurm and Körner [28], 2017 S2 atm. cor.(SEN 2 COR) BOA reflect. RNN 74 18
Siachalou et al. [15], 2015 LS, RE geometric TOA reflect. HMM 90 6
correction,
image registration
Hao et al. [13], 2015 MODIS image reprojection, statistical RF 89 6
atm. cor. [45] phen.features
Conrad et al. [12], 2014 SPOT, segmentation, vegetation OBIA + RF 86 9
RE, QB atm. cor. [45] indices
Foerster et al. [10], 2012 LS phen. NDVI DT 73 11
normalization, statistics
atm. cor. [45]
Peña-Barragán et al. [14], 2011 ASTER segmentation, vegetation OBIA+ DT 79 13
atm. cor. [46] indices
Conrad et al. [11], 2010 SPOT segmentation, vegetation OBIA + 80 6
ASTER atm. cor. [45] indices DT
In general, a direct numerical comparison of classification accuracies is difficult, since these are
dependent on the number of evaluated samples, the extent of evaluated area and the number of
classified categories. Nonetheless, we compare our method with the approaches of Siachalou et al. [15]
and Hao et al. [13] in detail since their achieved classification accuracies are on a similar level as
ours. Hao et al. [13] used an RF classifier on phenological features, which were extracted from NDVI
and NDWI time series of MODIS data. Their results demonstrate that good classification accuracies
with hand-crafted feature extraction and classification methods can be achieved if data of sufficient
temporal resolution are available. However, the large spatial resolution (500 m) of the MODIS sensor
limits the applicability of this approach to areas of large homogeneous regions. On a smaller scale,
Siachalou et al. [15] report good levels of accuracy on small fields. For this, they used a hidden
Markov models (HMMs) with a temporal series of four LS images combined with one single RapidEye
ISPRS Int. J. Geo-Inf. 2018, 7, 129 16 of 18
(RE) image for field border delineation. Methodologically, HMMs and conditional random fields
(CRFs) [16] are closer to our approach since the phenological model is approximated with an internal
chain of hidden states. However, these methods might not be applicable for long temporal series,
since Markov-based approaches assume that only one previous state contains classification-relevant
information.
Overall, this comparison shows that our proposed network can achieve state-of-the-art
classification accuracy with a comparatively large number of classes. Furthermore, the S2 data
of non-atmospherically-corrected TOA values can be acquired easily and does not require further
preprocessing. Compared to previous work, we were able to process larger tiles by using convolutional
recurrent cells with only a single recurrent encoding layer. Moreover, we neither required atmospheric
correction, nor additional cloud classes, since one classification decision is derived from the entire
sequence of observations.
7. Conclusions
In this work, we proposed an automated end-to-end approach for multi-temporal classification,
which achieved state-of-the-art accuracies in crop classification tasks with a large number of crop
classes. Furthermore, the reported accuracies were achieved without radiometric and geometric
preprocessing. The trained and inferred data were atmospherically uncorrected and contained clouds.
In traditional approaches, multi-temporal cloud detection algorithms utilize the sudden positive
change in reflectivity of cloudy pixels and achieve better results than other traditional mono-temporal
remote sensing classifiers [47]. Results of this work indicate that cloud masking can be learned jointly
together with classification. By visualizing internal gate activations in our network in Section 5.1,
we found evidence that some recurrent cells were sensitive to cloud coverage. These cells may be used
by the network to internally mask cloudy pixels similar to an external cloud filtering algorithm.
In Sections 5.2 and 5.3, we further evaluated the classification results quantitatively and
qualitatively. Based on several findings, we derived that the network has approximated a
discriminative crop-specific phenological model based on a raw series of TOA S2 observations. Further
inspection revealed that some crops were inconsistently classified in both growing seasons. This may
be caused by seasonally-variable environmental conditions, which may have been implicitly integrated
into the encoded phenological model. We employed our network for the task crop classification
since vegetative classes are well characterized by their inherently temporal phenology. However,
the network architecture is methodologically not limited to vegetation modeling and may be employed
for further tasks, which may benefit from the extraction of temporal features. We hope that our results
encourage the research community to utilize the temporal domain for their applications. In this regard,
we publish the T ENSOR F LOWsource code of our network along with the evaluations and experiments
from this work.
Supplementary Materials: The source code of the network implementation and further material is made publicly
available at https://github.com/TUM-LMF/MTLCC.
Acknowledgments: We would like to thank the Bavarian Ministry of Food, Agriculture and Forestry (StMELF)for
providing ground truth data in excellent semantic and geometric quality. Furthermore, we thank the Leibnitz
Supercomputing Centre (LRZ)for providing access to computational resources, such as the DGX-1 and P100servers
and N VIDIA for providing one T ITAN X GPU.
Author Contributions: M.R. and M.K. conceived and designed the experiments. M.R implemented the network
and performed the experiments. Both authors analyzed the data and M.R. wrote the paper. Both authors read and
approved the final manuscript.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Zhang, L.; Zhang, Q.; Du, B.; Huang, X.; Tang, Y.Y.; Tao, D. Simultaneous Spectral-Spatial Feature Selection
and Extraction for Hyperspectral Images. IEEE Trans. Cybern. 2018, 48, 16–28.
ISPRS Int. J. Geo-Inf. 2018, 7, 129 17 of 18
2. Odenweller, J.B.; Johnson, K.I. Crop identification using Landsat temporal-spectral profiles.
Remote Sens. Environ. 1984, 14, 39–54.
3. Reed, B.C.; Brown, J.F.; VanderZee, D.; Loveland, T.R.; Merchant, J.W.; Ohlen, D.O. Measuring Phenological
Variability from Satellite Imagery. J. Veg. Sci. 1994, 5, 703–714.
4. Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate.
arXiv 2014, arXiv:1409.0473v7.
5. Rush, A.; Chopra, S.; Weston, J. A Neural Attention Model for Sentence Summarization. arXiv 2017,
arXiv:1509.00685v2
6. Shen, S.; Liu, Z.; Sun, M. Neural Headline Generation with Minimum Risk Training. arXiv 2016,
arXiv:1604.01904v1.
7. Nallapati, R.; Zhou, B.; dos Santos, C.N.; Gulcehre, C.; Xiang, B. Abstractive Text Summarization Using
Sequence-to-Sequence RNNs and Beyond. arXiv 2016, arXiv:1602.06023v5.
8. Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. arXiv 2014,
arXiv:1409.3215v3.
9. Chorowski, J.; Bahdanau, D.; Serdyuk, D.; Cho, K.; Bengio, Y. Attention-based models for speech recognition.
Adv. Neural Inf. Process. Syst. 2015, 1, 557–585.
10. Foerster, S.; Kaden, K.; Foerster, M.; Itzerott, S. Crop type mapping using spectral-temporal profiles and
phenological information. Comput. Electron. Agric. 2012, 89, 30–40.
11. Conrad, C.; Fritsch, S.; Zeidler, J.; Rücker, G.; Dech, S. Per-Field Irrigated Crop Classification in Arid Central
Asia Using SPOT and ASTER Data. Remote Sens. 2010, 2, 1035–1056.
12. Conrad, C.; Dech, S.; Dubovyk, O.; Fritsch, S.; Klein, D.; Löw, F.; Schorcht, G.; Zeidler, J. Derivation
of temporal windows for accurate crop discrimination in heterogeneous croplands of Uzbekistan using
multitemporal RapidEye images. Comput. Electron. Agric. 2014, 103, 63–74.
13. Hao, P.; Zhan, Y.; Wang, L.; Niu, Z.; Shakir, M. Feature Selection of Time Series MODIS Data for Early Crop
Classification Using Random Forest: A Case Study in Kansas, USA. Remote Sens. 2015, 7, 5347–5369.
14. Peña-Barragán, J.M.; Ngugi, M.K.; Plant, R.E.; Six, J. Object-based crop identification using multiple
vegetation indices, textural features and crop phenology. Remote Sens. Environ. 2011, 115, 1301–1316.
15. Siachalou, S.; Mallinis, G.; Tsakiri-Strati, M. A hidden markov models approach for crop classification:
Linking crop phenology to time series of multi-sensor remote sensing data. Remote Sens. 2015, 7, 3633–3650.
16. Hoberg, T.; Rottensteiner, F.; Feitosa, R.Q.; Heipke, C. Conditional random fields for multitemporal and
multiscale classification of optical satellite imagery. IEEE Trans. Geosci. Remote Sens. 2015, 53, 659–673.
17. Zhang, L.; Zhang, L.; Du, B. Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of
the Art. IEEE Geosci. Remote Sens. Mag. 2016, 4, 22–40.
18. Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing:
A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36.
19. Hu, F.; Xia, G.S.; Hu, J.; Zhang, L. Transferring Deep Convolutional Neural Networks for the Scene
Classification of High-Resolution Remote Sensing Imagery. Remote Sens. 2015, 7, 14680–14707.
20. Scott, G.J.; England, M.R.; Starms, W.A.; Marcum, R.A.; Davis, C.H. Training Deep Convolutional Neural
Networks for Land-Cover Classification of High-Resolution Imagery. IEEE Geosci. Remote Sens. Lett.
2017, 14, 549–553.
21. Makantasis, K.; Karantzalos, K.; Doulamis, A.; Doulamis, N. Deep Supervised Learning for Hyperspectral
Data Classification through Convolutional Neural Networks. In Proceedings of the 2015 IEEE International
Geoscience and Remote Sensing Symposium (GARSS), Milan, Italy, 26–31 July 2015; pp. 4959–4962.
22. Castelluccio, M.; Poggi, G.; Sansone, C.; Verdoliva, L. Land Use Classification in Remote Sensing Images by
Convolutional Neural Networks. arXiv 2015, arXiv:1508.00092.
23. Lyu, H.; Lu, H.; Mou, L. Learning a Transferable Change Rule from a Recurrent Neural Network for Land
Cover Change Detection. Remote Sens. 2016, 8, 506.
24. Jia, X.; Khandelwal, A.; Nayak, G.; Gerber, J.; Carlson, K.; West, P.; Kumar, V. Incremental Dual-memory
LSTM in Land Cover Prediction. In Proceedings of the 23rd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 867–876.
25. Mou, L.; Bruzzone, L.; Zhu, X.X. Learning Spectral-Spatial-Temporal Features via a Recurrent Convolutional
Neural Network for Change Detection in Multispectral Imagery. arXiv 2018, arXiv:1803.02642v1.
ISPRS Int. J. Geo-Inf. 2018, 7, 129 18 of 18
26. Braakmann-Folgmann, A.; Roscher, R.; Wenzel, S.; Uebbing, B.; Kusche, J. Sea Level Anomaly Prediction
using Recurrent Neural Networks. arXiv 2017, arXiv:1710.07099v1.
27. Sharma, A.; Liu, X.; Yang, X. Land Cover Classification from Multi-temporal, Multi-spectral Remotely
Sensed Imagery using Patch-Based Recurrent Neural Networks. arXiv 2017, arXiv:1708.00813v1.
28. Rußwurm, M.; Körner, M. Temporal Vegetation Modelling using Long Short-Term Memory Networks
for Crop Identification from Medium-Resolution Multi-Spectral Satellite Images. In Proceedings of
the IEEE/ISPRS Workshop on Large Scale Computer Vision for Remote Sensing Imagery (EarthVision),
Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA , 21–26 July 2017.
29. Graves, A.; Wayne, G.; Danihelka, I. Neural Turing Machines. arXiv 2014, arXiv:1410.5401v2.
30. Siegelmann, H.; Sontag, E. On the Computational Power of Neural Nets. J. Comput. Syst. Sci.
1995, 50, 132–150.
31. Rafal, J.; Wojciech, Z.; Ilya, S. An Empirical Exploration of Recurrent Network Architectures. In Proceedings
of the 32nd International Conference on International Conference on Machine Learning, Lille, France,
6–11 July 2015; Volume 7, pp. 2342–2350.
32. Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. Gradient flow in recurrent nets: The difficulty
of learning long-term dependencies. In A Field Guide to Dynamical Recurrent Networks; IEEE Press:
New York, NY, USA, 2001; pp. 237–243.
33. Yoshua, B.; Patrice, S.; Paolo, F. Learning long-term dependencies with gradient descent is difficult.
IEEE Trans. Neural Netw. 1994, 5, 157–166.
34. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780.
35. Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning
Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014,
arXiv:1406.1078v3.
36. Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM Network: A Machine
Learning Approach for Precipitation Nowcasting. Adv. Neural Inf. Process. Syst. 2015, 1, 802–810.
37. Hahnloser, R.; Sarpeshkar, R.; Mahowald, M.A.; Douglas, R.J.; Seung, H.S. Digital selection and analogue
amplification coexist in a cortex-inspired silicon circuit. Nature 2000, 405, 947–951.
38. Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier Nonlinearities Improve Neural Network Acoustic Models.
Proc. Int. Conf. Mach. Learn. 2013, 28, 6.
39. Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980v9.
40. Cohen, J. A coefficient of agreeement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46.
41. Karpathy, A.; Johnson, J.; Fei-Fei, L. Visualizing and Understanding Recurrent Networks. arXiv 2015,
arXiv:1506.02078.
42. Fung, T.; Ledrew, E. The Determination of Optimal Threshold Levels for Change Detection Using Various
Accuracy Indices. Photogramm. Eng. Remote Sens. 1988, 54, 1449–1454.
43. McHugh, M.L. Interrater reliability: the kappa statistic. Biochem. Med. 2012, 22, 276–282.
44. Ünsalan, C.; Boyer, K.L. Review on Land Use Classification. In Multispectral Satellite Image Understanding:
From Land Classification to Building and Road Detection; Springer: London, UK, 2011; pp. 49–64.
45. Richter, R. A spatially adaptive fast atmospheric correction algorithm. Int. J. Remote Sens. 1996, 17, 1201–1214.
46. Matthew, M.W.; Adler-Golden, S.M.; Berk, A.; Richtsmeier, S.C.; Levine, R.Y.; Bernstein, L.S.; Acharya, P.K.;
Anderson, G.P.; Felde, G.W.; Hoke, M.P. Status of Atmospheric Correction using a MODTRAN4-Based
Algorithm. In Proceedings of the SPIE Algorithms for Multispectral, Hyperspectral, and Ultra-Spectral
Imagery VI, Orlando, FL, USA, 16–20 April 2000; pp. 199–207.
47. Hagolle, O.; Huc, M.; Villa Pascual, D.; Dedieu, G. A multi-temporal method for cloud detection, applied to
FORMOSAT-2, VENuS, LANDSAT and SENTINEL-2 images. Remote Sens. Environ. 2010, 114, 1747–1755.
c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).