Logo Passei Direto
Buscar
Material
páginas com resultados encontrados.
páginas com resultados encontrados.
left-side-bubbles-backgroundright-side-bubbles-background

Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?

Ao continuar, você aceita os Termos de Uso e Política de Privacidade

left-side-bubbles-backgroundright-side-bubbles-background

Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?

Ao continuar, você aceita os Termos de Uso e Política de Privacidade

left-side-bubbles-backgroundright-side-bubbles-background

Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?

Ao continuar, você aceita os Termos de Uso e Política de Privacidade

left-side-bubbles-backgroundright-side-bubbles-background

Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?

Ao continuar, você aceita os Termos de Uso e Política de Privacidade

left-side-bubbles-backgroundright-side-bubbles-background

Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?

Ao continuar, você aceita os Termos de Uso e Política de Privacidade

left-side-bubbles-backgroundright-side-bubbles-background

Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?

Ao continuar, você aceita os Termos de Uso e Política de Privacidade

left-side-bubbles-backgroundright-side-bubbles-background

Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?

Ao continuar, você aceita os Termos de Uso e Política de Privacidade

left-side-bubbles-backgroundright-side-bubbles-background

Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?

Ao continuar, você aceita os Termos de Uso e Política de Privacidade

left-side-bubbles-backgroundright-side-bubbles-background

Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?

Ao continuar, você aceita os Termos de Uso e Política de Privacidade

left-side-bubbles-backgroundright-side-bubbles-background

Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?

Ao continuar, você aceita os Termos de Uso e Política de Privacidade

Prévia do material em texto

Opening the Black-Box: A Systematic Review on
Explainable AI in Remote Sensing
Adrian Höhl1∗, Ivica Obadic1,2∗, Miguel-Ángel Fernández-Torres3, Hiba Najjar4, Dario Oliveira5,
Zeynep Akata6,7, Andreas Dengel4 and Xiao Xiang Zhu1,2
1Data Science in Earth Observation, Technical University of Munich (TUM)
2Munich Center for Machine Learning, 80333 Munich, Germany
3Image Processing Laboratory (IPL), Universitat de València (UV)
4University of Kaiserslautern-Landau and German Research Center for Artificial Intelligence (DFKI)
5School of Applied Mathematics, Getulio Vargas Foundation
6Institute for Explainable Machine Learning at Helmholtz Munich
7Interpretable and Reliable Machine Learning, Technical University of Munich (TUM)
∗Shared first authorship
Abstract—In recent years, black-box machine learning ap-
proaches have become a dominant modeling paradigm for knowl-
edge extraction in Remote Sensing. Despite the potential benefits
of uncovering the inner workings of these models with explainable
AI, a comprehensive overview summarizing the used explainable
AI methods and their objectives, findings, and challenges in
Remote Sensing applications is still missing. In this paper, we
address this issue by performing a systematic review to identify
the key trends of how explainable AI is used in Remote Sensing
and shed light on novel explainable AI approaches and emerging
directions that tackle specific Remote Sensing challenges. We also
reveal the common patterns of explanation interpretation, discuss
the extracted scientific insights in Remote Sensing, and reflect on
the approaches used for explainable AI methods evaluation. Our
review provides a complete summary of the state-of-the-art in
the field. Further, we give a detailed outlook on the challenges
and promising research directions, representing a basis for novel
methodological development and a useful starting point for new
researchers in the field of explainable AI in Remote Sensing.
Index Terms—earth observation, explainable AI (xAI), explain-
ability, interpretable ML (IML), interpretability, remote sensing
I. INTRODUCTION
Machine Learning (ML) methods have shown outstanding
performance in numerous Earth Observation (EO) tasks [1, 2],
but mostly they are complex and lack the interpretability and
explanation of their decisions. In EO applications, understand-
ing the model’s functioning and visualizing the interpretations
for analysis is crucial [3], as it allows practitioners to gain
scientific insights, discover biases, trustworthiness and fairness
for policy decisions, and to debug and improve a model.
Currently, the European Parliament is adopting an Artificial
Intelligence (AI) Act to ensure that the methods developed
and used in Europe align with fundamental rights and values
such as safety, privacy, transparency, explicability, as well as
social and environmental wellbeing [4]. It is anticipated that
other governments worldwide will implement similar regula-
tions [5]. Many applications in EO could potentially violate
these values when data and AI are employed for analysis
and decision-making. Nevertheless, explainable AI (xAI) can
2017 2018 2019 2020 2021 2022 2023*
Year
5000
10000
15000
20000
25000
N
u
m
b
er
of
M
L
P
u
b
li
ca
ti
on
s
in
R
S
ML in RS
xAI in RS
50
100
150
200
250
300
350
400
N
u
m
b
er
of
x
A
I
P
u
b
li
ca
ti
on
s
in
R
S
Fig. 1. The number of publications of ML and xAI in EO, obtained by using
the search query described in Section III-B and Appendix A, differ by a factor
of ≈ 70. (*Calculated amount of publications given the first ten months of
the year and assuming a linear trend in 2023.)
contribute to aligning these practices with rights and laws.
Hence, xAI emerges as a promising research direction to tackle
the above-mentioned scientific and regulatory challenges with
observational data [6].
Despite these potential benefits, currently, there is a gap
between the usage of ML methods in EO and the works
that aim to reveal the workings of these models. This gap
is illustrated in Figure 1, where the blue curve shows that the
number of ML papers in Remote Sensing (RS) has drastically
increased in the past few years. Although the works dealing
with xAI in EO shown by the green curve have also increased
rapidly, there is still a gap by a factor of ≈ 70 compared to
the number of papers of ML in RS. This increasing number of
xAI in RS papers motivates us to summarize the existing work
in the field and provide an overview to RS practitioners about
the recent developments, which might lead to narrowing this
gap and making xAI approaches more common in the field
ar
X
iv
:2
40
2.
13
79
1v
1 
 [
cs
.L
G
] 
 2
1 
Fe
b 
20
24
for RS.
xAI methods are typically designed to work on natural
images. However, the RS imagery has different properties than
natural images [7]. First, images are captured from above.
This perspective comes with unique scales, resolutions, and
shadows. For instance, a RS image can cover whole land-
scapes, thousands of square kilometers, while natural images
can only cover a tiny fraction of it [8]. Second, RS captures
images in other electromagnetic spectra, apart from the usual
RGB channels. From hyperspectral over Synthetic Aperture
Radar (SAR) to Light Detection and Ranging (LiDAR), RS
does cover a wide range of reflectance data. Third, usual
Red-Green-Blue (RGB) cameras are primarily passive, while
RS can be active, which changes properties like the radar
shadows, foreshortening, layover, elevation displacement, and
speckle effects [9]. Besides the different image properties, the
tasks differ as well. Computer Vision (CV) primarily addresses
dynamic scenarios where objects can be spatially covered by
others. In contrast, the observed processes in RS happen on
different spectral and spatiotemporal scales, long-term and
short-term, and the systems modeled are very complex and
diverse. Often, the observed systems are not fully understood
and can not be observed completely or only indirectly. For
instance, weather events, hydrology, hazards, ecosystems, and
urban dynamics. Although there are numerous reviews for
xAI in the literature [10–12], they typically do not reflect
on the works specific for RS nor reveal how the existing
xAI approaches tackle the above challenges related to remote
sensing data. Therefore, a review of xAI tailored to the field of
RS is necessary to reveal the key trends, common objectives,
challenges, and latest developments.
While current reviews of xAI in EO are focused on social,
regulatory, and stakeholder perspectives [13, 14], specific
subtopics in RS [15], or do not provide a broad literature
database [15–17], this paper targets the applications and ap-
proaches from xAI in RS and follows a systematic approach
to gather a comprehensive literature database. To this end,
we conduct a systematic literature search for xAI in RS in
three commonly used literature databases in RS. From this
literature database, we identify the relevant papers and provide
an overview of typical usages, objectives, and new approaches
in the field. Additionally, we propose a categorization of
the employed xAI methods, which allows to get a detailed
overview and understanding of the xAI taxonomy and tech-
niques. We found that xAI methods for RS are most often used
for landcover monitoring, agriculture monitoring, and natural
hazard monitoring. They cover 40% of all 207 papers, with
34, 26, and 26 papers, respectively. Overall, SHapley Additive
exPlanations (SHAP) is the most frequent method, used in
over 38% of all publications. However, we observe distinct
variations of xAI methods in the EO tasks. For instance,
besides local approximation methods (29%), backpropagation
methods (23%), particularly Class Activation Mapping (CAM)
methods (18%), are very prominent in landcover monitoring.
Conversely, behind local approximation (22%), perturbation
methods (22%) are widely utilized in interpreting agricultural
models. It should be noted, however,temperature, river density, and other geological factors. They
predict the susceptibility based on an ensemble of machine
learning models and evaluate the factor importance with PFI,
PDP, and LIME. Their analysis reveals that land surface
temperature, fault density, earthquake peak acceleration, and
river density are among the critical indicators. In another
study, [175] explores earthquake probability prediction with
eXtreme Gradient Boosting (XGBoost) and explains the model
predictions with SHAP.
b) Distinct approaches across EO tasks: Besides the
above-described common practices of applying xAI in RS, we
also identified distinct modeling approaches applied to specific
EO tasks. For instance, intrinsic CNN architectures based on
the xAI category explanation association are introduced for
socioeconomic indicators estimation. The semantic bottleneck
models are particularly used to decompose the model pre-
diction into a linear combination of human-understandable
concepts, as described in Section IV-A. Consequently, these
models explain the complex relationship between the remote
sensing scene and the socioeconomic indicators through a
set of interpretable concepts. In [176], a semantic bottleneck
model aims to predict environment scenicness which is shown
to be an important proxy for socioeconomic indicators, such
as quality of life or health. The proposed model is jointly
optimized to first predict intermediate concepts corresponding
to landcover classes, followed by a linear layer that esti-
mates the scenicness based on the predicted landcover classes.
This architecture allows the authors to directly relate the
scenicness prediction to landcover classes which are human-
understandable concepts. Similarly, for the problem of estimat-
ing the living quality from aerial images, a semantic bottleneck
model is proposed in [177] with the interpretable concepts
being different liveability dimensions like population statistics,
building quality, physical environment, safety, and access to
amenities.
Although interpretable-by-design models provide lower per-
formances than DL architectures, they are leveraged in diverse
contexts. Linear regression is used to support more complex
methods and evaluates the linear trend for a feature [142,
178]. The application of GLM is observed for small sample
sizes [179] and Generalized Additive Models (GAMs) are
used for larger datasets [153, 161, 162, 180]. Zhou et al.
reduce the feature space with Principal Component Analysis
(PCA) in [180], while advanced versions based on EBMs
are introduced in [153, 161]. EBMs use pairwise feature
interactions within the GAM and consider gradient boosting to
train each feature function consecutively. Fang et al. follows
this idea and uses FNNs as function approximations [162].
A decision tree incorporating linear regression models at the
terminal leaves is employed in [181] to gain scientific insights
into the partitioning of precipitation into evapotranspiration
and runoff. Karmakar et al. apply LDA to SAR images.
Their bag-of-words approach uses superpixels as words to do
landcover mapping [182, 183]. Last but not least, Martı́nez-
Ferrer et al. analyze the weights of the GP to find anomalous
samples [152].
Fuzzy logic-based models are another approach that is
mainly used to evaluate the trustworthiness of ML models.
An Ordered Weighted Averaging (OWA) fusion function is
presented in [184] for burned area mapping, which allows
controlling if the fusion results are affected by more false
positives than false negatives, and vice versa. At the same
time, it foresees if there are only a few highly or many
low relevant factors when providing a particular output. The
outputs of Fuzzy Logic Systems (FLSs) for tree monitoring
[185] can also be easily validated. Last but not least, measure-,
integral- and data-centric indices based on the Choquet integral
(an aggregation function defined with respect to the fuzzy
measure) are introduced in [186–189] intending to develop
more understandable ensembles for landcover mapping.
xAI methods are not only used to gain insights about
the model behavior. Some works explore their abilities to
classify or enhance training [190, 191]. Xiong et al. [191]
leverage Grad-CAM to create masks that occlude features the
network has emphasized, encouraging the network to exploit
other features [191]. The output of three NNs is merged
through attention in another study [164]. The outputs and
the attributions of the DeepLIFT backpropagation method are
the key values of the attention layer. In contrast, Li et al.
[190] directly classify objects on top of the CAM attribution
map. Hence, their weakly supervised method does not need
bounding boxes, making the image labels sufficient. They
propose three distinct methods to segment the attribution maps
into bounding boxes, each performing differently depending
on the number, heterogeneity, and complexity of the objects.
Furthermore, the same object detection strategy is used by
[192] to evaluate the xAI methods for their localization abili-
ties.
2) Adapted xAI approaches: The xAI methods commonly
used in RS are originally designed to work on classical
computer vision datasets, such as ImageNet [193]. These are
mostly composed of natural images which, as described in
Section I, significantly differ from remote sensing acquisitions.
This raises the question of whether the utilized xAI methods
fit remote sensing data well. In this respect, we identified
several works that propose new approaches considering re-
mote sensing data properties to produce better explainability
2021 2022 2023
Year
0
1
2
3
4
5
6
N
u
m
b
er
of
P
u
b
li
ca
ti
on
s
xAI Method
Attention
CAM
FNN
GLM
Grad-CAM
PFI
ProtoPNet
Fig. 6. Number of papers adapting existing xAI methods, grouped per year.
insights. This is typically performed by adapting the existing
xAI methods or by proposing new DL architectures tailored
to RS applications.
Recently, as illustrated in Figure 6, there has been an
increase in the tailoring of xAI methods for RS properties.
50% of the publications have been published in the last year,
and no novel method was identified before 2021. A strong
focus is given to modifying the CAM and Grad-CAM methods
for different RS applications and domains. For example, Feng
et al. exploit that the target objects in SAR images occupy
only a small portion of the image to propose a new CAM
method which, instead of upsampling the feature map of
the convolutional layer to the input image, downsamples the
input image to the feature map of the last convolutional layer
[194]. This operation results in saliency maps that localize
precisely the targets in SAR images compared to the Grad-
CAM method. Additionally, a CAM method able to produce
much more fine-grained saliency maps than the prior CAM
methods is introduced by Guo et al. in [192]. Similar to Layer-
CAM [89], they use shallow layers to get more fine-grained
results but also rely on scores, following the idea of Score-
CAM [195], which are not as noisy as gradients. Another
attempt to improve current CAM methods was proposed by
Marvasti-Zadeh et al., who utilize the attribution maps from
all network layers and decrease their number through only
retaining maps which minimize the information loss according
to the Kullback-Leiber (KL)-divergence [196]. Additionally,
local attribution maps are generated by masking the image
and weighting the maps by the corresponding bounding box
and their prediction confidence. These local maps need to be
smoothed with a Gaussian kernel to avoid sharp boundaries
in the resulting CAM. This novel approach, called Crown-
CAM, is evaluated on a localization metric and outperforms
(augmented) Score-CAM and Eigen-CAM on a tree crown
localization task. CAM variants for hyperspectral images are
developed in [197]. The saliency map is now a 3D volume
instead of a 2D image, and each voxel attributes the different
channels in depth, which provides pixel-wise and spectral-
cumulative attributions.Other Grad-CAM adaptions proposed
are a median pooling [198] and a pixel-wise [199] variant.
Regarding deep learning approaches, a model prototype
approach for RS is proposed in [200], where the ProtoPNet
architecture [66] is adapted to also consider the location of
the features. The network is iteratively trained in 3 stages.
Firstly, the encoder and prototype layers are trained to produce
the prototypes. Secondly, the prototypes are replaced by the
nearest prototype of the corresponding class. Lastly, only the
output layer weights are trained to produce the final prediction.
In contrast to ProtoPNet, the prototype similarity is scaled with
a location value learned by the network. This acknowledges
the location of the prototypes in the image and makes them
location-aware. Another approach is presented in [201], where
a reconstruction objective is added to the loss function to
enable the Grad-CAM++ method to more accurately localize
multiple target objects within an aerial image scene.
Finally, we also identified one approach addressing the
model agnostic methods. Specifically, Brenning adapts the
model-agnostic PFI method to incorporate spatial distances
[202]. Analogous to the PFI method, the features are permuted,
and the mean decrease in predictive accuracy is assessed.
Notably, features are permuted across various predefined dis-
tances, revealing the spatial importance or sensitivity of the
model.
B. RQ2: Interpretation and evaluation of xAI explanations
1) Understanding and validating explanations: The proper-
ties of remotely sensed data mentioned in Section I can hinder
the intuitive understanding of the semantics of the objects or
the individual pixels in the remote sensing scene. Therefore,
an obstacle that arises when applying xAI in RS is explanation
interpretation, as the relevant features often do not have a
straightforward interpretation. We identify that this challenge
is frequently tackled by transforming the raw features into
interpretable indices used for model training [144, 203] or
by associating domain knowledge with the explanation at the
post-hoc stage [121, 146].
a) The creation of interpretable input spaces: The data
preprocessing in ML usually generates meaningful features
for models. Ensuring human-understandable features is essen-
tial for comprehending input-output relationships or gaining
knowledge of the model. For instance, understanding xAI
attributions of SAR images can pose challenges. However,
Ge et al. address this by transforming the feature space
into human-interpretable factors using a U-Net architecture
[199]. They derive three interpretable variables from the VH
polarization backscatter coefficients and the VV polarization
interferometric coherence of the Sentinel-1 images, providing
insights into the temporal variance and the temporal minimum.
While the temporal variance changes between different crops
and landforms over time, the temporal minimum is specific
for flooded rice fields due to their proximity to water. This
facilitates the understanding of the attribution of the applied
Grad-CAM method.
A well-known preprocessing technique is the creation of spec-
tral indices, like the NDVI vegetation index. This approach
leverages the known characteristics of the various spectral
channels and facilitates a more accessible representation and
interpretation of the features. Our study observes a significant
adoption of these indices.
Encoding the feature space through dimensionality reduction
approaches is commonly believed to decrease the interpretabil-
ity. However, Brenning demonstrates that a helpful interpre-
tation from a complex, correlated feature space can still be
maintained [132]. The author employs structured PCA for
feature space reduction and RF for classification. The xAI
methods ALE, PDP, SHAP, and PFI are evaluated both in
the original and the transformed feature space. The results
indicate that the behavior of the features can be identified
in the principal components and allow the extraction of the
relationship between the main feature groups.
b) Explanation interpretation with domain knowledge:
When raw inputs are used for model training, domain knowl-
edge is often utilized to reveal the semantics of the relevant
features. It is usually derived from already established indica-
tors for the task under study, by utilizing external geolocated
data sources, or based on expert knowledge.
One task where existing indicators are commonly used is
agriculture monitoring, where insights about crop phenology
are utilized to uncover the semantics of the relevant features.
Concretely, Obadic et al. associate the NDVI index with
the time points that are highly attended by the transformer
encoder model to reveal the key phenological events for
crop discrimination [146]. Similarly, agronomical knowledge
is considered in [118, 121] to reveal the phenological stages
by which the crops are classified. Further, the SHAP values
of the input features used in [138] for crop yield estimation
are aggregated according to the observation acquisition date
into the stages of crop development and crop harvest. This
aggregation reveals the criticality of the observations close to
the harvest stage for the yield prediction. Similar efforts for
explanation interpretation can also be found for other tasks,
such as for predicting landslide susceptibility. For instance,
Zhang et al. [159] relate the spatial heterogeneity of the
SHAP values to the natural characteristics and human activities
for the following factors: lithology, slope, elevation, rainfall,
and NDVI. They argue the differences in factor contributions
can be attributed to local regional characteristics such as
topography, geology, or vegetation.
Regarding geolocated data sources, we identified that land-
cover class labels are associated with the input observations to
interpret the explanation. Abitbol and Karsai identify the re-
lationship between urban topology and the average household
income [204]. For this purpose, the Grad-CAM attributions
of the image pixels are related to their landcover classes.
Commercial/residential units are characterized by low income,
while natural areas link to higher income.
In certain works, expert knowledge is used to validate or in-
terpret the findings. For example, the most important drivers of
landslides noted by the field investigation reports are compared
to the features sorted by explanation magnitudes in [164]. LDA
is leveraged in [182] for unsupervised sea ice classification,
and closely related classes are identified by KL divergence.
The LDA derived topics and probabilities, together with the
interclass distances and segmented images, enable experts to
assess the physical relationship between these classes. For
example, water bodies, melted snow, and water currents have a
similar topic distribution and a substantial physical similarity:
liquid water. In [205], a comprehensive approach is employed
for integrating expert knowledge to interpret and guide the
process of finding and validating dwelling styles and their
evolution within ethnic communities. Initially, the building
footprint is extracted from satellite imagery, and with the
help of experts, the styles are classified into different types.
Subsequently, xAI is applied, leveraging XGBoost and SHAP
to determine the importance, with the experts inferring the
semantic meaning. In the last step, the experts guide the
clustering of the proximity of the styles and the analysis of the
geographical distribution. The results reveal the emergence of
mixed ethical styles inheriting from the three traditional styles,
which can be correlated to migration records.
A
ne
cd
ot
al
Q
ua
nt
ita
tiv
e
Toy
Tas
k
U
se
r St
ud
y
0
20
40
60
80
100
120
140
160
180
N
u
m
b
er
of
P
ap
er
s
Fig. 7. The number of times different evaluation types are considered in the
literature, based on their corresponding number of papers.
2) Evaluation of xAI methods for RS: As indicated in Sec-
tion IV-C,xAI evaluation poses an open challenge. Figure 7
illustrates that the majority of the literature relies on anecdotal
evidence, often involving the visualization of arbitrarily or
cherry-picked examples. In addition to this informal evalu-
ation, some authors [200, 206, 207] evaluated their methods
on straightforward toy tasks where humans can easily identify
the sought relationships.
Paudel et al. exclusively conduct a user survey to assess
(1) the importance of the DL features by experts and (2)
judge the importance by the xAI method [145]. Five crop
modeling experts assigned important scores to the features,
subsequently compared to the post-hoc SHAP importance
scores. Afterward, the experts categorized the model expla-
nations into four categories (strong) agree, (strong) disagree,
and should provide a justification. Overall, it is demonstrated
that experts can understand the model explanations, and the
explanations enable the experts to get insights into the models.
However, the task remains challenging and has the potential
for misconceptions about the model behavior.
Regarding quantitative evaluation, we identified 16 studies
testing the xAI methods for RS problems with functional
metrics. These metrics mainly asses the explanation quality
properties described in Section IV-C and the localization
ability of the xAI methods. The backpropagation methods are
most frequently evaluated with a particular focus on the CAM
approaches.
The explanation quality properties are evaluated in [127,
194, 197, 198, 208–210]. Kakogeorgiou and Karantzalos [208]
evaluate Saliency, Input*Gradient (I*G), IG, GuidedBack-
prop, (Guided) Grad-CAM, Occlusion, DeepLift, LIME, and
Smooth Gradient (SmoothGrad) on landcover mapping tasks.
Utilized metrics are max-sensitivity, file size, computation
time, and Most Relevant First (MoRef). Max-sensitivity mea-
sures the reliability (maximum change in explanation) when
the input is slightly perturbed, the file size is used as a
proxy for explanation sparsity, and MoRef measures how fast
the classification accuracy declines when removing the most
relevant explanations. The results indicate no obvious choice
for this task. While Occlusion, Grad-CAM, and LIME were
the most reliable according to the max-sensitivity metric, they
lack high-resolution explanations. Grad-CAM also emerged as
the most computationally efficient choice. Evaluating faith-
fulness with MoRef is also conducted in [197] where the
performance of different CAM methods is compared. Also, a
class sensitivity metric is employed to measure the correlation
between the attributions of different target classes. Addi-
tionally, [198] leverages median-pooling Grad-CAM, Grad-
CAM, Grad-CAM++, and SmoothGrad-CAM++ on the metric
drop/increase in confidence when occluding important regions
from the image. Similar metrics are also evaluated in [194],
where two different occlusion tests are employed to benchmark
various CAM approaches with proposed Self-Matching CAM.
While the methods perform similarly when the most influential
pixels are perturbed, only Self-Matching CAM observes a
minor drop in the prediction difference when only the most
salient pixels are not occluded, demonstrating that its explana-
tion focuses on the target object in the image Further, [209] de-
velop attention NN and compare it to attention networks from
the literature, CAM, Grad-CAM and Layer-CAM. The metrics
max-sensitivity and average % drop/increase in confidence are
utilized. In a different study, Low faithfulness of the Grad-
CAM explanations based on similar metrics is demonstrated
in [210] for the task of real estate appraisal. This study also
conducts model and data randomization tests to find that Grad-
CAM is sensitive to changes in network weights and label
randomization. A similar evaluation for sensitivity to changes
in the network weights is conducted in [127], where the goal
is to assess how damaging the network weights affect the
inference of the CNN models. To evaluate this, the authors
compare the CAM maps of the original model against the
CAM maps of the model with disabled weights. Their results
indicate that the ResNet model exhibits the highest resilience
to network changes as the CAM map model with disabled
weights matches most closely the CAM map of the original
ResNet model.
The localization ability is evaluated in [192, 196, 211] For
instance, Su et al. [211] use various CAM approaches to
evaluate the ResNet model on the tasks of dealing with large
variance problems and its localization ability. Particularly, seg-
mentation maps were derived from CAM, Grad-CAM, Grad-
CAM++, SmoothGrad-CAM++, and Score-CAM approaches
by thresholding their attributions. Grad-CAM showed superior
performance in localization accuracy and the ability to extract
complex features in images with large variance. In [192], the
localization capabilities of the new CSG-CAM method are
compared to CAM methods, Grad-CAM, Grad-CAM++ and
Score-CAM. To this end, a weakly supervised segmentation
task driven by the xAI attributions maps is considered. If
the attributions succeed a threshold for a certain class, the
class is assumed to be contained in the scene. Similarly, [196]
evaluates the localization ability of different CAM approaches
for tree crown detection.
When it comes to the evaluation of other xAI methods,
despite their high usage, the SHAP explanations are quantita-
tively evaluated only in [212] where a FNN trained using the
features with the highest SHAP values slightly outperforms
the model trained on the complete set of features. Further,
the attention weights are evaluated in [146] by inspecting
drops in the accuracy for crop mapping when the transformer
model is trained on a subset of dates with the highest attention
values. The results verify that attention weights select the key
dates for crop discrimination as training the model with only
the top 15 dates is sufficient to approximate the accuracy of
the model trained on the complete dataset. Finally, Dantas
et al. [133] employ distinct metrics for their counterfactual
generation. These metrics aim to ensure a certain quality of
the counterfactuals. They used proximity (evaluating closeness
to the original input instance, measured by l2 distance), com-
pactness (ensuring a small number of perturbations across time
steps), stability (measuring the consistency for comparable
input samples) and plausibility (measuring adherence to the
same data distribution).
C. RQ3: xAI Objectives and Findings in RS
In this section, we analyze the motivation for applying xAI
in RS according to the common objectives specified in Section
IV-D. The frequency of the objectives for using xAI displayed
in Figure 8a reflects a similar trend as in [13]. Namely, the
objective to control is the most commonly found in 127 works.
It is followed by a large gap in the objective to discover
insights, which is met in 51 works. Next, follow the objectives
to improve and to justify which are found in 27 and 5 works,
respectively. Moreover, Figure 8b shows that the objective
to discover has a unique distribution across the EO tasks:
it frequently occurs in studies monitoring the atmosphere,
vegetation, and the human environment interaction. In contrast
to other objectives, which are often identified in the studies
related to the three main EO tasks described in Section V-A.
The observed distribution of the EO tasks with the objective
to discover indicates that these studies focus on knowledge
extraction for more recent EO tasks, which are used to deal
with monitoring and understanding the driving factors behind
extreme events related to climate changes or natural hazards.
These studies are of utmost importance as their insights can
To
Con
tr
ol
To
D
isc
ov
er
To
Im
pr
ov
e
To
Ju
st
ify
0
15
30
45
60
75
90
105
120
N
u
m
b
er
of
P
ap
er
s
(a) Number of times xAI objectives are considered in the publications.
A
gr
icu
ltu
ra
l M
on
ito
rin
g
Atm
os
ph
er
e
M
on
ito
rin
g
Bui
ld
ing
M
ap
pi
ng
Eco
sy
st
em
In
te
ra
ct
io
ns
H
um
an
Env
iro
nm
en
t In
te
ra
ct
io
n
H
yd
ro
lo
gy
M
on
ito
rin
g
Lan
dc
ov
er
M
ap
pi
ng
N
at
ur
al
H
az
ar
d
M
on
ito
rin
g
So
il
M
on
ito
rin
g
Su
rfa
ce
Tem
pe
ra
tu
re
Pre
di
ct
io
n
Tar
ge
t M
ap
pi
ng
Veg
et
at
io
n
M
on
ito
rin
g
W
ea
th
er
Clim
at
e
Pre
di
ct
io
n
O
th
er
0
5
10
15
20
25
N
u
m
b
er
of
P
u
b
li
ca
ti
on
s
EO Task
To Control
To Discover
To Improve
To Justify
(b) Number of times xAI objectives are considered in the publications per EO
task.
Fig. 8. The frequency of the objectives for using xAI in RS identified in the
analyzed works, categorized according to the scheme introduced in Section
IV-D. Papers can appear in multiple categories due to ambiguous meanings
or multiple motivations and objectives.
enable the application of early preventive measures for disaster
management. For instance, the problem of uncovering the key
drivers for wildfire susceptibility is tackled in [213]. Applying
SHAP and PDP on the trained ML model reveals that soil
moisture, humidity, temperature variables, wind speed, and
NDVI are among the most important factors associated with
wildfires. Regarding monitoring natural hazards, Biass et al.
uses SHAP to identify that volcanic deposits, terrain proper-
ties, and vegetation types are strongly linked to vegetation
vulnerability after volcanic eruptions. Concretely, increased
vegetation vulnerability is associated with higher lapilli ac-
cumulations, being crops and forests the most and the least
susceptible vegetation types, respectively [214].
Extracting novel scientific insights can also be valuable to
gain new knowledge about ambiguously defined concepts. One
such concept is wilderness. Although monitoring wilderness
areas can be an important indicator of sustainable devel-
opment, there is no uniform definition in the literature for
wilderness, which is usually described with several philo-
sophical reflections. Stomberg et al. aim to infer wilderness
characteristics from Sentinel-2 images. By analyzing occlusion
sensitivity maps, they reveal that wilderness is characterized
by large areas containing natural undisrupted soils, in contrast
to anthropogenic areas that have specific edge shapes and lie
close to impervious structures [215]. Besides discovering new
knowledge, scientific insights can also be used to identify the
reasons for inaccuracies in Earth system models. Silva et al.
provide insights into a lightning model’s structural deficits
by predicting its error with a gradient boosting algorithm
and interpreting it with SHAP. The error is computed as
the difference between the output of the model and satellite
observational data. Their analysis reveals potential deficits in
high convective precipitation and landcover heterogeneities
[216].
On the other hand, the studies with the objective to control
are mainly comparing the inference mechanisms of various
established ML models used for EO. Consequently, they are
mostly conducted in landcover mapping. One example of such
study is presented in [217], where SHAP feature importance
rankings are compared across several ML models trained on
Sentinel-2 images, topography, phenology, and texture fea-
tures. This study illustrates that the used models consistently
rank the relevance of the red edge bands for predicting
the agriculture class and the chlorophyll-sensitive bands for
identifying deciduous trees. Conversely, inconsistencies among
the models are found when analyzing the importance of the
texture features.
Agricultural monitoring is the second most common task
where this objective is considered. These studies typically
focus on assessing the reliability of the proposed models by
quantifying the relevance of the multitemporal information.
For instance, Rußwurm and Körner use the gradient method
to measure the temporal importance assigned by various DL
models for crop classification. Their analysis reveals that the
transformer and the LSTM approaches ignore the observations
obscured by clouds and focus on a relatively small number of
observations when compared to the CNN models [147]. Fol-
lowing a similar approach, Xu et al. evaluate the generalization
capabilities of these models when inference is performed in
different years than the ones used for model training [121].
Their experiments indicate that the LSTM model adapts better
to changes in crop phenology induced by late plantation
compared to the transformer model.
The objective to control also supports different types of
studies that anticipate the model decisions in scenarios that
can occur in practical applications. For instance, Gawlikowski
et al. investigate the impact of exposing a DNN, initially
trained on cloud-free images, to cloudy images [218]. They
apply Grad-CAM to identify crucial regions for the classifier
in both image types. Their findings reveal different factors
why the network misclassifies the OOD examples, including
the coverage of structures through cloud cover and shadows, as
well as the homogeneity or heterogeneity induced by different
cloud types. Forecasting weather extremes constitutes another
application where verifying that ML models generalize well
enough to predict unlikely and rare events is essential. Rampal
et al. provide a showcase for extreme rainfall downscaling
[219]. By interpreting a CNN with Grad-CAM they show that
the model is able to learn the most relevant meteorological
features.
The studies with the objective to justify usually discuss why
the insights about the model workings are relevant for the pol-
icymakers. For instance, Campos-Taberner et al. argue that the
interpretability of the DL models in agricultural applications is
critical to ensure fair payouts to the farmers according to the
EU Common Agricultural Policy (CAP) [143]. By applying
a perturbation approach, the authors find that the summer
acquisitions and the red and near-infrared Sentinel-2 spectral
bands carry essential information for the decisions of the
used RNN model. Further, the human footprint index, which
represents the human pressure on the landscape and can be a
valuable metric for environmental assessments, is predicted
from Landsat imagery in [220]. LRP is then leveraged to
visually highlight the relevant features in the images. Besides
policy-makers, the approach presented in [134] can also assist
individual users during the production phase of a ML model,
as it justifies the model’s validity for inference by providing
example-based explanations. If the explanation example does
not fit the input instance, the model is considered unreliable
for the RS image classification task.
Lastly, a large part of the works with the objective to
improve focus on adapting the existing xAI methods to RS
tasks and are already described in Section V-A2. Concerning
the other techniques for model improvement based on xAI
insights, we identified that data augmentation and model
augmentation are commonly performed.
Data augmentation is applied by Beker et al., who simulate
synthetic data for training a CNN model for the detection of
volcanic deformations [173]. By conducting an explainability
analysis with Grad-CAM on real data, the authors identify
that the model wrongly predicts volcanic deformations on,
e.g., salt lakes and slope-induced signals, which are patterns
that were not considered in the simulated data. These insights
are used to improve the prediction performance by fine-tuning
the last layer of the CNN model on a hybrid synthetic-real
dataset that accounts for these patterns. An iterative model
improvement for satellite onboard processing through a weakly
supervised human-in-the-loop is proposed by Kim et al. in
[209]. In their scenario, the satellite has a convolutional at-
tention network for object classification onboard. Initially, the
uncertain explanations and images are identified by assessing
the similarity of the attention maps across the attention blocks.
An inconsistency metricis introduced to measure the similarity
of the attribution maps emphasizing commonly highlighted
regions. If a sample performs badly on that metric, experts
on the ground refine it by labeling the incorrect pixels in the
attention map. In the last step, a local model is retrained with
the refined data, and the onboard model gets updated. A last
example of data augmentation is the work by Xiong et al.
[191] where, for every image, a mask based on the Grad-CAM
output is used during training to occlude those regions for
which the model provides the highest activation, attempting to
encourage the network to explore other features in the image.
On the other hand, model augmentation is performed in
[222], where xAI is initially used to compare the work-
ings of the Vision Transformer (ViT) and CNN models for
monocular height estimation. Upsampling the feature maps
of the penultimate layer to the input image unveils that even
though the ViT learns more disentangled representations than
the CNN, a neuron in the penultimate layer of the ViT still
encodes semantics of multiple objects. This is resolved by
proposing a new transformer model that learns distinct latent
representations per semantic class.
TABLE I: A complete list of all relevant papers in this review, aggregated by EO Task and xAI
Category. (†xy indicates a new method which was derived from xy; a full list of all used acronyms
can be found in Appendix C.)
EO Task xAI Category Paper, xAI Methods Model Evaluation Objective
To
y
Ta
sk
A
ne
cd
ot
al
Q
ua
nt
ita
tiv
e
C
on
tr
ol
Im
pr
ov
e
D
is
co
ve
r
Ju
st
if
y
Agricultural Monitoring Backpropagation [199] PWGrad-CAM†Grad-CAM CNN ✓ ✓
[149] RAM CNN ✓
Backpropagation, Embedding Space [147] Gradient, Attention, Activation
Assessment
Transformer,
CNN,
ConvLSTM,
LSTM
✓ ✓
Backpropagation, Embedding Space,
Feature Selection
[121] MDI, Gradient, Attention, Activa-
tion Assessment
RF,
Transformer,
aLSTM
✓ ✓
Backpropagation, Joint Training [135] Explanation Association, LRP GAN ✓ ✓ ✓
Backpropagation, Local Approximation [144] SHAP, IG LSTM ✓ ✓
[145] SHAP, IG LSTM ✓ ✓
Embedding Space [148] Attention Transformer ✓ ✓
Feature Selection [118] MDI, GFFS RF ✓ ✓
Interpretable by Design [153] EBM EBM ✓ ✓
Interpretable by Design, Perturbation [152] GP, Occlusion GP ✓ ✓
[142] GLM, PFI, ALE, PDP GLM, RF ✓
Joint Training [150] Prototype CNN ✓ ✓
Local Approximation [137] SHAP GB ✓ ✓
[136] SHAP GB ✓ ✓
[138] SHAP GB ✓ ✓
[223] SHAP RF ✓
[139] SHAP GB ✓ ✓
[140] SHAP GLM, SVM,
FNN, RF,
kNN, GB
✓ ✓
Model Translation, Feature Selection [117] MDI, Rule Extraction RF, DT ✓ ✓
Perturbation [202] Spatial Variable Importance
Profiles†PFI
LDA, kNN
LDA, RF
✓ ✓
[141] PFI RF ✓ ✓
[151] GP ✓ ✓
Perturbation, Embedding Space [143] Occlusion, Activation Assessment LSTM ✓ ✓
[146] Occlusion, Attention Transformer ✓ ✓
Atmosphere Monitoring Backpropagation [224] IG NN Ensemble ✓
Backpropagation, Local Approximation,
Perturbation
[225] SHAP, XRAI, Occlusion CNN ✓
Embedding Space [226] Activation Assessment FNN ✓
[227] Attention FNN ✓
Embedding Space, Joint Training, Fea-
ture Selection
[122] Model Association, MDI, Activa-
tion Assessment
RF, NN+ML
Ensemble,
FNN
✓ ✓
Feature Selection [123] MDI Tree Ensemble ✓ ✓
[124] MDI Tree Ensemble ✓ ✓
[125] MDI Tree Ensemble ✓ ✓
Interpretable by Design [228] GLM†GLM GLM ✓ ✓
Local Approximation [229] SHAP FNN ✓
[230] SHAP GLM, GB ✓
[231] SHAP GB ✓ ✓
[232] SHAP GB ✓
Local Approximation, Feature Selection [115] MDI, SHAP RF ✓ ✓
Perturbation [233] PFI aCNN ✓ ✓
[234] PFI, ALE RF ✓ ✓ ✓
Perturbation, Feature Selection [120] MDI, PDP RF ✓
Perturbation, Local Approximation [235] SHAP, LIME, Ceteris Paribus Pro-
files, PFI
RF ✓
[236] SHAP, PDP GB ✓ ✓
Building Mapping Backpropagation [237] Grad-CAM CNN ✓ ✓
[238] A*G, DeepLift CNN ✓ ✓
[239] Grad-CAM CoAtNet ✓ ✓
[211] CAM, Score-CAM, Grad-
CAM++, SmoothGrad-CAM++, Grad-
CAM
CNN ✓ ✓
Backpropagation, Embedding Space [240] Attention, Grad-CAM CNN ✓ ✓
Backpropagation, Embedding Space,
Perturbation
[222] IG, Occlusion, Activation Assess-
ment
Transformer,
CNN
✓ ✓ ✓
Local Approximation [205] SHAP GB ✓
Local Approximation, Feature Selection [114] MDI, SHAP RF, FNN ✓ ✓
Ecosystem Interactions Interpretable by Design [180] GAM GAM ✓
Interpretable by Design, Local Approx-
imation
[178] GLM, SHAP GLM, GB ✓
Local Approximation [241] SHAP GB ✓ ✓
[242] SHAP GB ✓
Perturbation, Local Approximation [243] SHAP, PFI GB ✓ ✓
TABLE I: A complete list of all relevant papers in this review, aggregated by EO Task and xAI
Category. (†xy indicates a new method which was derived from xy; a full list of all used acronyms
can be found in Appendix C.)
EO Task xAI Category Paper, xAI Methods Model Evaluation Objective
To
y
Ta
sk
A
ne
cd
ot
al
Q
ua
nt
ita
tiv
e
C
on
tr
ol
Im
pr
ov
e
D
is
co
ve
r
Ju
st
if
y
Human Environment Interac-
tion
Backpropagation [204] GuidedGrad-CAM CNN ✓
[220] LRP CNN ✓ ✓ ✓
Backpropagation, Embedding Space,
Perturbation
[210] Occlusion, Grad-CAM, Activation
Assessment
GLM, NN En-
semble
✓ ✓
Interpretable by Design [244] GLM GLM ✓ ✓
Joint Training [177] Explanation Association CNN ✓ ✓
[176] Explanation Association CNN ✓
Local Approximation [245] SHAP RF ✓
[246] SHAP GB ✓
[247] SHAP GB ✓ ✓
[248] SHAP GB ✓
[249] SHAP, LIME RF ✓
Perturbation [250] ALE RF ✓
[251] PFI CNN ✓ ✓
Perturbation, Embedding Space [215] Occlusion, Activation Assessment CNN ✓
Perturbation, Local Approximation [252] SHAP, LIME, Occlusion FNN ✓
Hydrology Monitoring Backpropagation [206] Gradient GP ✓ ✓ ✓
[253] Gradient GP ✓ ✓
[254] Grad-CAM CNN ✓ ✓ ✓ ✓
[255] Grad-CAM CNN ✓ ✓
[256] Gradient CNN ✓
[257] Grad-CAM CNN ✓ ✓
Backpropagation, Local Approximation [258] SHAP, IG CNN ✓ ✓
Backpropagation, Perturbation [259] PDP, IG, DeepLift, Expected Gra-
dients
LSTM ✓ ✓
Embedding Space [260] Attention CNN ✓ ✓
[207] smlp†Attention FNN ✓ ✓ ✓
Interpretable by Design, Local Approx-
imation, Perturbation
[181] Cubist, ALE, LIME, PFI Cubist, GB ✓
Local Approximation [261] SHAP FNN ✓ ✓
[262] SHAP RF, GB, DT,
GB
✓ ✓
Landcover Mapping Backpropagation [197] 3DGrad-CAM†Grad-CAM CNN ✓ ✓ ✓ ✓
[218] Grad-CAM CNN ✓ ✓
[192] CSG-CAM†Grad-CAM, Score-
CAM, Grad-CAM++, Grad-CAM
CNN ✓ ✓ ✓
[201] ERC-CAM†CAM CNN ✓ ✓
[263] XRAI CNN ✓ ✓
[264] Grad-CAM CNN ✓ ✓ ✓
[126] Grad-CAM CNN ✓ ✓
[198] MPGrad-CAM†Grad-CAM, Grad-
CAM++, SmoothGrad-CAM++, Grad-
CAM
CNN ✓ ✓ ✓
[127] CAM CNN ✓ ✓ ✓
Backpropagation, Embedding Space [209] CAM, Attention, Grad-CAM,
Layer-CAM, Attention, Attention
CNN ✓ ✓
Backpropagation, Local Approximation,
Perturbation
[208] IG, DeepLift, Gradient, Guid-
edBackprop, LIME, GuidedGrad-CAM,
Grad-CAM, I*G, Occlusion
CNN ✓ ✓ ✓
Counterfactuals [133] GAN CNN ✓ ✓
Embedding Space [265] Activation Assessment CNN, Capsu-
leNet
✓ ✓
[266] Activation Assessment CNN ✓ ✓
[267] Attention CNN ✓ ✓
[268] Attention, Activation Assessment CNN ✓ ✓
Embedding Space, Local Approxima-
tion
[189] SHAP, Activation Assessment NN Ensemble ✓ ✓
[129] LIME, Activation Assessment CNN ✓ ✓
[187] SHAP, Activation Assessment NN Ensemble ✓ ✓ ✓
[188] SHAP, Activation Assessment NN Ensemble ✓ ✓
Example-based [134] WIK CNN ✓ ✓ ✓ ✓
Interpretable by Design [269] Rules NN+Tree En-
semble
✓ ✓
[183] LDA LDA ✓ ✓
Interpretable by Design, Embedding
Space
[270] LDA, Activation Assessment NN Ensemble ✓ ✓
Joint Training [271] Prototype CNN ✓ ✓
Local Approximation [131] SHAP RF ✓ ✓
[130] SHAP, LIME CNN ✓ ✓
[217] SHAP SVM, RF, GB,
CNN
✓ ✓
[128] SHAP CNN ✓ ✓
TABLE I: A complete list of all relevant papers in this review, aggregated by EO Task and xAI
Category. (†xy indicates a new method which was derived from xy; a full list of all used acronyms
can be found in Appendix C.)
EO Task xAI Category Paper, xAI Methods Model Evaluation Objective
To
y
Ta
sk
A
ne
cd
ot
al
Q
ua
nt
ita
tiv
e
C
on
tr
ol
Im
pr
ov
e
D
is
co
ve
r
Ju
st
if
y
[272] LIME CNN ✓ ✓
Local Approximation, Feature Selection [111] MDI, SHAP RF, CNN ✓ ✓
[112] MDI, LIME RF ✓ ✓Perturbation, Local Approximation [132] SHAP, ALE, PDP, PFI RF ✓ ✓
Perturbation, Local Approximation,
Feature Selection
[110] MDI, SHAP, PFI RF ✓ ✓
Natural Hazard Monitoring Backpropagation, Embedding Space [164] DeepLift, Attention NN Ensemble ✓ ✓
Backpropagation, Embedding Space,
Perturbation
[221] Occlusion, Grad-CAM, Activation
Assessment
CNN ✓ ✓ ✓
Backpropagation, Local Approximation,
Perturbation
[213] SHAP, IG, PDP LSTM ✓
Embedding Space, Local Approxima-
tion
[162] SHAP, Activation Assessment GAMI-Net ✓ ✓
Interpretable by Design [161] EBM EBM ✓ ✓
[184] Fuzzy Rules Rules ✓ ✓ ✓
[163] SNN†FNN GAM ✓ ✓
Joint Training [168] Prototype CNN ✓ ✓
Local Approximation [169] SHAP FNN ✓
[155] SHAP SVM, RF ✓ ✓
[165] SHAP RF, GB ✓ ✓
[157] SHAP FNN ✓ ✓
[273] SHAP GB ✓ ✓
[158] SHAP GB ✓ ✓
[175] SHAP NN+Tree En-
semble
✓ ✓
[166] SHAP CNN ✓ ✓
[172] SHAP, LIME CNN ✓ ✓
[154] SHAP RF ✓ ✓
[159] SHAP GB ✓
Perturbation [171] PFI, PDP RF ✓
Perturbation, Feature Selection [119] MDI, PFI GLM,
SVM, FNN,
RF, kNN,
GLM, Tree
Ensemble, NB
✓ ✓
Perturbation, Local Approximation [156] SHAP, PDP, PFI CNN, NN En-
semble, FNN
✓ ✓
[174] LIME, Occlusion, PDP Tree Ensemble ✓ ✓
[170] SHAP, PFI RF ✓ ✓
[160] LIME, PDP RF ✓ ✓
[167] SHAP, PDP GB ✓ ✓
Soil Monitoring Backpropagation, Perturbation [274] SquareGrad, VarGrad, Gradient,
PFI, SmoothGrad, I*G, IG
ConvLSTM ✓ ✓
Local Approximation [275] SHAP GB ✓ ✓
[276] SHAP CNN ✓ ✓
[277] SHAP SVM ✓ ✓
Perturbation [278] PFI RF ✓ ✓
Surface Temperature Predic-
tion
Backpropagation [279] LRP FNN ✓ ✓
Local Approximation [280] SHAP GB ✓
[281] SHAP GB ✓ ✓
[282] SHAP GB ✓
[283] SHAP GB ✓
Local Approximation, Feature Selection [115] MDI, SHAP RF ✓ ✓
Target Mapping Backpropagation [194] Self-Matching-CAM†Grad-CAM CNN ✓ ✓
[284] CAM CNN ✓ ✓
[190] CAM CNN ✓
[285] IG, Score-CAM NN Ensemble ✓ ✓
[286] Grad-CAM CNN ✓ ✓
Backpropagation, Embedding Space [287] CAM, Activation Assessment BagNet ✓ ✓
Backpropagation, Embedding Space,
Perturbation
[191] Attention, Empirical Receptive
Field, Grad-CAM, Activation Assess-
ment
CNN ✓ ✓ ✓
Backpropagation, Local Approximation [288] GuidedBackprop, SHAP CNN ✓ ✓
[289] CAM, LIME CNN ✓ ✓
Embedding Space [290] Attention CNN ✓
[291] Activation Assessment VAE ✓ ✓
[292] Activation Assessment BagNet ✓ ✓
[293] Attention aCNN ✓ ✓
Local Approximation [294] SHAP CNN ✓ ✓
[295] LIME CNN ✓ ✓
[296] LIME CNN ✓ ✓
Perturbation [297] Occlusion NN Ensemble ✓ ✓
TABLE I: A complete list of all relevant papers in this review, aggregated by EO Task and xAI
Category. (†xy indicates a new method which was derived from xy; a full list of all used acronyms
can be found in Appendix C.)
EO Task xAI Category Paper, xAI Methods Model Evaluation Objective
To
y
Ta
sk
A
ne
cd
ot
al
Q
ua
nt
ita
tiv
e
C
on
tr
ol
Im
pr
ov
e
D
is
co
ve
r
Ju
st
if
y
Vegetation Monitoring Backpropagation [196] Eigen-CAM, AugScore-CAM,
Crown-CAM†Grad-CAM, Score-CAM
CNN ✓ ✓
[298] GuidedGrad-CAM CNN ✓ ✓
Backpropagation, Local Approximation [299] Deconvolution, Gradient, SHAP,
GuidedGrad-CAM
CNN ✓ ✓
Interpretable by Design [179] GLM GLM ✓
[185] Fuzzy Rules FLS ✓ ✓
Joint Training [300] Explanation Association CNN ✓ ✓
Local Approximation [212] SHAP FNN ✓ ✓ ✓
[203] SHAP SVM, RF, GB ✓ ✓
[301] SHAP Extra Tree ✓ ✓
[302] SHAP RF ✓
Local Approximation, Feature Selection [109] MDI, LIME GB ✓ ✓
Model Translation [303] Rule Extraction RF ✓
[304] Rule Extraction RF ✓ ✓
Perturbation [305] PFI, PDP RF ✓
Perturbation, Local Approximation [214] SHAP, PFI GB ✓
[306] SHAP, PFI RF ✓
Weather Climate Prediction Backpropagation [307] LRP CNN ✓ ✓
[308] LRP FNN ✓ ✓
[309] LRP FNN ✓
[219] Grad-CAM CNN ✓ ✓
[310] GuidedBackprop CNN ✓ ✓
Backpropagation, Feature Selection [113] MDI, Grad-CAM RF, CNN ✓ ✓
Feature Selection [311] TreeInterpreter RF ✓
Joint Training [200] Prototype†ProtoPNet CNN ✓ ✓ ✓
Local Approximation [312] SHAP GB ✓ ✓
[216] SHAP GB ✓ ✓
Perturbation, Local Approximation [313] SHAP, PFI GLM, RF,
SVM, FNN
✓ ✓
Other Backpropagation [314] Gradient FNN ✓ ✓
Feature Selection [116] MDI RF ✓ ✓
Interpretable by Design [182] LDA LDA ✓
Interpretable by Design, Embedding
Space
[270] LDA, Activation Assessment NN Ensemble ✓ ✓
Joint Training [315] Explanation Association CNN ✓ ✓
Local Approximation [316] SHAP RF ✓ ✓
[317] SHAP RF ✓ ✓
[318] SHAP GB ✓
Perturbation [319] PFI, PDP RF ✓
VI. DISCUSSION
In this section, we address research questions RQ4 and RQ5
and discuss the usage of xAI methods in RS, the adaptation of
xAI methods to RS problems, as well as the evaluation of such
methods (RQ4). Furthermore, we highlight the challenges,
limitations, and emerging research directions in the field
(RQ5).
Agr
icu
ltu
ra
l M
on
ito
rin
g
Atm
os
ph
er
e
M
on
ito
rin
g
Bui
ld
in
g
M
ap
pi
ng
Eco
sy
st
em
In
te
ra
ct
io
ns
Hum
an
Env
iro
nm
en
t In
te
ra
ct
io
n
Hyd
ro
lo
gy
M
on
ito
rin
g
Lan
dc
ov
er
M
ap
pi
ng
Nat
ur
al
Haz
ar
d
M
on
ito
rin
g
So
il
M
on
ito
rin
g
Su
rfa
ce
Tem
per
at
ur
e
Pre
di
ct
io
n
Tar
ge
t M
ap
pi
ng
Veg
et
at
io
n
M
on
ito
rin
g
W
ea
th
er
Clim
at
e
Pre
di
ct
io
n
O
th
er
0
2
4
6
8
10
12
14
16
N
u
m
b
er
of
P
u
b
lic
at
io
n
s
Year
2017
2018
2020
2021
2022
2023
(a) EO tasks over time.
Bac
kp
ro
pa
ga
tio
n
Cou
nt
er
fa
ct
ua
ls
Em
bed
di
ng
Sp
ac
e
Exa
m
pl
e-
ba
se
d
Fe
at
ur
e
Se
lec
tio
n
In
te
rp
re
ta
bl
e
by
Des
ig
n
Jo
in
t Tra
in
in
g
Loc
al
App
ro
xi
m
at
io
n
M
od
el
Tra
ns
la
tio
n
Per
tu
rb
at
io
n
0
10
20
30
40
N
u
m
b
er
of
P
u
b
lic
at
io
n
s
Year
2017
2018
2020
2021
2022
2023
(b) xAI categories over time.
Fig. 9. The number of publications in the EO tasks and xAI categories over
time.
xAI in RS is still a very young and dynamic field. Figure 9
should give a broad overview of the development and evolution
of the field in the last years and provide the reader with
the context for the usage as well as the faced challenges of
xAI in RS. Figure 9a illustrates varying trends in the EO
tasks, with vegetation, atmosphere, or natural hazard monitor-
ing recently getting more attention. Meanwhile, agricultural
monitoring and target mapping show a constant trend, while
landcover mapping publications are decreasing. This suggests
a varying time lag before new approaches from ML are
widely applied in RS. Indeed, landcover mapping is one of
the most established EO tasks, with a variety of benchmark
datasets and models. While many approaches have already
been applied to landcover mapping, other tasks are currently
being explored, creating a trend in these areas. Similarly,
Figure 9b reveals increasing trends for local approximation and
perturbation methods while backpropagation methods stag-
nate. This stagnation may be attributed to an interconnection
between EO tasks. Initial assessments of xAI often involve
simple and operable methods, specifically, local approximation
and perturbation methods, which are model-agnostic. Hence,
reflecting the high usage of xAI in RS while these xAI methods
are extensively explored.
A. RQ4: Recommended practices in xAI for EO
1) The usage of xAI: In order to get meaningful and
reliable explanations, xAI methods should be only applied to
ML models that have achieved a good generalization [320].
Furthermore, the robustness of ML models and xAI methods
should be quantified, as it is common for them to disagree
[321]. To maintain consistent outcomes, it is advisable to con-
sider a set of initializations of xAI methods and different ML
models, as it is done when benchmarking the performance of a
ML model. For instance, in the works identified in our review,
different model types [121, 155, 156, 165, 203, 254, 259],
model seeds or configurations [145, 259], and xAI method
seeds [233] were used. Furthermore, the outcomes of different
xAI methods can be compared, as it is done in [117, 142,
144, 155, 178, 181, 192, 196, 198, 209, 211, 238, 259, 282].
For example, Jing et al. apply IG, ExpectedGradients (EG),
and DeepLift to exhaustively interpret different RNN-based
architectures (RNN, LSTM, GRU), also varying their numbers
of neurons [259]. Other related works are [178], where Zhu
et al. compare the results of gradient boosting and SHAP
applied to a LR model, or [238], where Hasanpour Zaryabi
et al. compare the use of I*G and DeepLift in several layers.
As can be derived from the results of this review, the
high and only usage of the SHAP model-agnostic method,
or similar, suggests that this method is used regardless of
the problem. We want to draw attention to this trend since
there is no single method that fits every problem [320]. The
practitioner should keep in mind that each method makes cer-
tain assumptions (e.g., SHAP assumes feature independence
that simplifies the problem’s definition). These assumptions
can often be violated, for example, with time series data, and
harm the interpretation results. Therefore, the assumptions of
the specific method need to be considered and aligned with the
data structure, task, and underlying model. Additionally, each
model-agnostic method represents one approach or paradigm
to generate explanations, regardless of the trained model and
input dataset. For specific tasks, specialized methods exist that
fit their need and a method should not only be used because it
is popular and easily accessible [322]. Furthermore, the choice
of the xAI method and its evaluation should be discussed in
each paper. This should be as much a part of the scientific
dialogue as the model selection and evaluation. Numerous
papers in this review did not mention their arguments for
selecting their xAI methods. The question arises if the authors
have considered the specificities of the used methods. Many
of the existing xAI methods, like SHAP, IG, or LRP, are only
an approximation of the underlying model which comes with
simplifying assumptions. We encourage the authors to address
this issue to lay the basis for the understanding and scientific
evaluation of the chosen approach. Good perspectives in this
respect are provided by Li et al. in [242].
Some of the modified Backpropagation (BP) methods, like
Guided BP, Deep Taylor Decomposition (DTD), and variants
of LRP, have been proven to be unfaithful to the model
[99, 323]. Adebayo et al. [99] demonstrate that perturbing
the weights of the neural network does not significantly
change the resulting explanations of the network. Another
work builds a foundational theoretical framework, showing
that many methods fail to provide class-sensitive explanations
and highlight only low-level features [323]. Nevertheless, these
methods are used in many works, as seen in Section V-A.
If such a method is used, it is advisable to use one of the
sanity checks mentioned in Section IV-C to verify the strategy
applied. But, as pointed out in several comparisons of different
methods [208, 324, 325], there exists no single xAI technique
that satisfies all evaluation metrics and is an obvious choice
for all EO problems. Rather, the methods should be chosen for
the specific task and goal, taking into account specifications
such as the need for global or local explanations, their model-
agnostic or model-specific nature, their input type, the com-
putational time required for the generation of explanations,
the quality or detail of these explanations, or the suitability
for the end-user, among others. A short discussion on how to
evaluate these methods can be found in the following section.
When using methods where the input is altered, like IG or
SHAP, the baseline is the neutral input, which should capture
the absence of any meaningful feature, and all outcomes are
compared to the output for this input. Hence, the choice of the
baseline is crucial and some methods have a built-in baseline
by construction. This is especially important when different
methods need to be compared [326, 327].
Often, the practitioner is interested in the causal relation-
ships and the explanation results of black-box models might
lead to unjustified causal claims [320]. xAI methods do not
provide the cause. Even when the explanation supports the
hypothesis from the research objective, it does not mean that
this hypothesis has been proven. Standard ML approaches
approximate correlations from data; therefore, the model can-
not assess the causal structure of the data. However, it can
help experts find unknown confounders and correlations by
giving them a tool to investigate the learned correlations. The
user must be very careful when making such assumptions to
discover underlying structures and be aware that observational
data often lacks common confounders, has strong feature
correlations and its causal structure is usually unknown. Causal
inference is a complementary research field to xAI and is
not covered in this review. Nevertheless, we identified several
works discussing the causality problem in xAI [136, 214, 241,
279].
2) Evaluation of xAI: As mentioned in Section V-B2, anec-
dotal evidence is frequently reported as the evaluation of the
applied approaches. Cherry-picking and qualitative evaluation
of explanations represent a challenge to humans. Because
human perception is mainly visual, humans are biased toward
certain types of xAI explanations. Besides others, humans
introduce cognitive distortion by drawing more attention to
negative examples and looking for simple but complete ex-
planations [328]. That is accompanied by other human biases,
like the confirmation bias, which is well-known in psychology
and favors explanations fitting the expectations, while con-
trary explanations are ignored [329]. These self-introduced
biases can lead to wrong reasoning about the explanations
and promoting a particular type of visualization. Hence, it
is hard to quantify the results objectively through anecdotal
evaluation. Therefore, a quantitative evaluation of xAI methods
is recommended. In addition, qualitative evaluation can be
done, but it should not be the only type of evaluation. When
users are expected to interact with explanations, attention
should be paid to creating understandable and appropriate
explanations for the specific end user, their requirements, and
their level of knowledge [330]. Then, keeping in mind these
biases is essential. Nevertheless, cherry-picking tends to be
insufficient for evaluating all types of explanations and, as
the only assessment criteria, it does not lead to a trustworthy
evaluation of xAI.
Beyond evaluation based on anecdotal evidence, there also
exist quantitative evaluation metrics (see Section IV-C) to
assess the explanation methods. However, only a few works
listed in Section V-B2 evaluate their methods against these
metrics. In contrast to these studies, it has been observed
that a more detailed quantitative evaluation of xAI is con-
ducted in climate science, given climate models. For instance,
Bommer et al. [325] evaluate seven Backpropagation methods,
among them three CAM-based methods. They use a climate
model to get reliable ground truth data for their explanations
and compare the xAI methods with functional explanation
evaluation by computing the following metrics: robustness,
faithfulness, randomization, complexity, and localization. For
their classification task, LRP and I*G are the best-performing
methods for the evaluated metrics. Some of these backpropa-
gation methods are also compared for their fidelity in [324].
First, the methods are evaluated on a synthetically created
dataset. Then, the results are related to those on a climate
simulation classification dataset. While Gradient, SmoothGrad,
I*G, IG, and LRPz are very noisy and may not provide insights
into the model, LRPcomp and SHAP seem to highlight the
attributions in a more meaningful way. IG and SHAP are also
evaluated with different baselines in [327]. The results confirm
that the baseline needs to be chosen carefully because a wrong
baseline can lead to different interpretations, both in attribution
magnitude and highlighted feature. Also,the authors suggest
that all methods should be evaluated on the same baseline for
comparison purposes.
There is no standard way to evaluate xAI until now, and
discussions on how to properly do this are still ongoing.
Nonetheless, different toolboxes have been developed for
comparing xAI methods using quantitative metrics [101, 331,
332]. Last but not least, only one survey [145] evaluates
the usefulness of the explanations to experts. However, with
the emphasis on human-centered AI in the current research
landscape [333], user studies are becoming significant to
quantify the benefit of the explanations and the understanding
of the end-users. Because they have been largely unexplored
in the context of xAI in RS, they pose a promising research
direction.
3) xAI benchmark datasets: Real-world datasets lack a
controlled ground truth for evaluating xAI methods and a
regulated environment is needed to lay the foundation for fully
transparent datasets. Similar to [334], there are efforts to create
synthetic datasets for EO tasks. Mamalakis et al. generated
a fully synthetic dataset where they leverage local piece-
wise linear functions to create a non-linear response to the
input drawn from a Gaussian distribution [335]. This method
allows for creating a regional climate prediction task from Sea
Surface Temperature (SST). They show that a simple FNN can
approximate this function and evaluate different post-hoc xAI
methods. To get reliable ground truth data, it is also possible
to use a simulation. For example, climate models [336] can
be leveraged. Other approaches to generate fully-synthetic RS
images include [337–339]. While Berkson et al. simulate dif-
ferent RS sensors, atmospheres, and scenes, including different
terrains, materials, and weather conditions [337], Shermeyer
et al. add different compositions of the vehicles [338] and Han
et al. show that the motions of vehicles can be included in the
simulation [339]. A framework for generating synthetic EO
datasets is presented in [340], demonstrating the efficiency of
these datasets for ML on off-shore wind farm detection. The
framework aims to extract expert knowledge about the objects
to be modeled in a machine-readable format (e.g., structure,
relationships, etc.), which can be combined to create new
datasets. However, there is no work comparing xAI methods
on these synthetic RS datasets.
B. RQ5: Challenges, limitations, and future directions
We identified several challenges that emerge from xAI, RS,
or the combination of them. Figure 10 gives an overview of
these challenges, which are categorized by color according to
the area from which they emerge, i.e., those emerging from
RS and xAI are in green and blue, respectively.
1) Combination of xAI with related fields:
a) Physics-aware ML: Physics-aware ML and xAI share
the objective of enhancing model reliability, trustworthiness,
and transparency. Both rely on domain expertise: xAI enables
experts to discover new insights into physical processes. This
is illustrated by studies such as [181], which explores runoff
and evapotranspiration parameterization, and [216], where the
errors of an Earth system model are predicted to hypothesize
about wrong model assumptions using xAI. Conversely, inte-
grating expert knowledge from physics helps to understand the
models because the incorporated prior knowledge is human-
understandable. The combination of both fields can yield
physically sound and interpretable models. We have identified
several approaches relying on physics-aware features applied
for knowledge extraction from SAR data. For instance, Zhao
et al. introduce a CNN operating in the complex domain which
aims to predict physical scattering properties from Polarimetric
SAR (PolSAR) images [341]. These properties are derived
with the H−A−α target decomposition method and describe
various urban, vegetation, and ocean surfaces. Huang et al.
inject similar physics-aware concepts into a CNN model for
sea-ice classification and conducts a xAI analysis to show
that the fusion provides a better separation of the different
classes based on their physical properties [270]. Given the
vehicle classification task, Feng et al. extract vehicle parts
with the attribute scattering center (ASC) model and construct
images for the extracted parts based on their scattering centers
selected with k-means clustering [297]. The image parts are
used as filters for the convolution of the original image. With
occlusion analysis, the authors demonstrate that incorporat-
ing the vehicle parts in the model inputs helps to improve
model robustness, enabling the practitioners to obtain intuitive
explanations and validate the model workings with domain
knowledge. In more recent work, Feng et al. also proposes
an intrinsically interpretable CNN model that first processes
the vehicle parts and the original image with a CNN encoder
[290]. Next, an attention module uses the encoder output given
the original image as queries, while the keys and the values are
derived from the encoder output for the target parts. Finally, a
convolution operation is applied to factor the contribution of
the vehicle parts into the class logits, thus explaining the model
prediction based on the different parts. All these works support
that physics-aware ML and xAI could complement each other.
Further design of these approaches and their extension to other
types of RS data, such as optical or hyperspectral images, is
a promising research direction for model improvement and
may also constitute an important step towards consolidating
the usage of explainable architectures.
b) Uncertainty: The uncertainty intervals give insight
into the generalization and training process of the model,
which is especially useful when dealing with extrapolation and
anomaly detection. While explanations can give more detailed
information about the models’ internal representations, they
lack an important metric: model confidence or explanation
reliability. Explanations which have a high variability or are
inaccurate may lead to misinterpretations. The combination
of uncertainty and xAI can enhance the information we
can get from only one approach. For instance, Marx et al.
[342] and Slack et al. [343] propose methods to provide
uncertainty intervals around the explanations. RS has not been
excluded from the convergence of these two disciplines [344,
345]. Through the adaptation of a perturbation method for
object detection (D-RISE) and its combination with Deep
Gaussian models, Huang et al. get attribution maps for model
uncertainty [344]. They show the efficiency of their approach
on SAR object detection, where trustworthy predictions are
especially needed since SAR images are hard to interpret for
humans. An uncertainty aware, interpretable-by-design model
frequently applied here is GPs [151, 152, 206, 253]. Further,
Blix et al. analytically derive the input features’ sensitivity to
the variance estimate of a GP model that can be later used for
uncertainty evaluation and selection of relevant features [206].
Hence, complementing xAI methods with uncertainty con-
stitutes an interesting research direction to ensure more reliable
RQ5: Challenges
and Limitations
of xAI in RS
Challenges
from RS
properties
Scale
Topology
Temporal
depen-
dencies
Restricted
availability of
labeled data
Combination
of xAI with
related fields
Physics-
based ML
Causal
inference
Uncertainty
Lack of
standardized
and objective
evaluation
of xAI
Towards
interpretable
DNNs
Concept
discovery
Intrinsic
interpretable
DNNs
Graph
Neural
Networks
(GNN)
Fig. 10. Challenges of xAI in RS. Those emerging from RS and xAI are in green and blue, respectively.
explanations and better interpretation results.
c) Causal inference: Understanding causality in xAI is
fundamental for uncovering the cause-and-effect relationships
within the model’s decision-making process. Techniques like
causal inference, counterfactual reasoning, and causal graph-
ical models are employed to trace the causalrelationships
between input features and model predictions [346]. These
methods aim to not just highlight correlations but elucidate
the direct causal links, enabling better comprehension of how
the AI system arrives at its decisions. Moreover, the integration
of causal reasoning into interpretable ML models, like causal
Bayesian networks or causal decision trees, has shown promise
in elucidating causal links between input features and predic-
tions [347]. In RS, causality involves understanding the cause-
and-effect dynamics between environmental processes and
the measurements acquired by RS instruments, with the aim
of monitoring natural resources and ecosystems. Approaches
such as structural equation modeling, directed acyclic graphs,
and Granger causality have been utilized to untangle causal
relationships within RS datasets [24]. These methods aim
to identify causal pathways between environmental variables,
allowing scientists to comprehend how changes in one vari-
able may cause alterations in others. However, causality for
explaining trained models in RS is yet to be extensively
explored, with many opportunities ahead in environmental
studies, land-use planning, disaster management, and climate
change research.
2) Challenges from RS properties: xAI in CV applications
does not account for RS image properties such as the existence
of different sources, scales, geographic relationships, and tem-
poral dependencies. The scale in RS describes the resolution
and scope of input data [8]. While the scope determines the
geographic extent, the resolution specifies to which degree
information can be captured, defining the shape, granularity,
and boundaries of the objects, among others. Additionally,
some concepts or objects, like landcover or mountains, have
no distinct boundaries or areas. Instead, changes are more
continuous and have irregular shapes. Hence, spatial resolution
automatically has semantic implications since the presence or
absence of information in the data influences the ability to
distinguish and interpret specific features or objects within the
scene. This remains a challenge for RS and xAI [16]: most
methods have problems with the high granularity of features,
highlight an increased spatial extent, or are very noisy. New
methods are constantly being developed to overcome these dis-
advantages. For instance, the well-known Grad-CAM has this
problem, and many studies want to enhance its capabilities.
Recent works leverage the lower-level feature representations
to create much more fine-grained saliency maps [192, 196].
Brenning addresses the challenge of scale by decoupling the
xAI method from spatial scale, yielding in a permutation
method which attributes across various distances [202]. The
model can also already tackle the scaling problem: a scale-
invariant autoencoder has been recently proposed in [348].
The scale of RS observations also determines the spectral
resolution. Every sensor captures reflectance data across differ-
ent wavelengths, extracting different ground-level information.
Until now, a comprehensive xAI evaluation for multispec-
tral and hyperspectral RS images is missing, taking into
account the spectral property that distinguishes them from
conventional images. Additionally, the current literature does
not consider the varying spectra and spectral resolutions be-
tween the platforms. De Lucia et al. use a 3D visualization
technique originating from medical imaging, together with a
CAM variant adapted for hyperspectral images, to show the
importance of different spectral bands per pixel [197]. This
provides insights both into the spatial and spectral attribution,
representing an advantage over the current CAM variants.
Current CAM methods only visualize the spatial attribution
despite the importance of the different spectral bands for
specific objects and environments.
Another challenge is the topology of the data [16]. Geo-
graphic relations can be a hidden confounder not captured
in the data. For example, social structures are strongly tied
to the location or have symbiotic relationships that exist
in and between the biosphere, lithosphere, cryosphere, and
hydrosphere. That makes analyzing and using xAI in RS
difficult. Therefore, a proper set of training features is very
important. One approach to tackling this challenge focuses on
location-awareness [200], where the prototypes are specific
to locations in the input. This assumes a static location in the
input image and considers the relative location from the input.
While this approach works for global images or images for the
same location, it does not take into account changing locations.
For example, there is a trade-off between the complexity or
size of a model and the resolution of the RS images. Working
with static locations and high-resolution images would require
huge resources for the model. Moreover, the flexibility of
using varying locations in the input and being location-aware
could not be achieved. A different strategy that facilitates
the learning of spatial relations based on a U-Net model is
presented in [224] for monitoring the exposure to surface
ozone. A grid cell is used as input to the U-Net model and
consists of high-level features for a geographical area. Hence,
this setup allows the usage of the IG attribution method to
uncover the contributions of the neighboring areas for surface
ozone pollution.
Currently, there is an increasing interest in the integration of
topological data analysis (TDA) [349] with ML [350, 351].
Duponchel attempts to discover the topology of the data
in an unsupervised way by mapping each pixel through a
lens function, dividing it into subsets, and clustering within
these subsets [352]. Edges are drawn between the subsets if
they share at least one pixel. Different groups are generated
according to the number of pixels in each subset and the
connectivity, which classify the pixels into different categories.
This approach is also applied in [353] to classify and hierar-
chically visualize the data in a dendrogram.
Furthermore, RS is often composed of sequential data.
Since most xAI methods are developed for natural images,
they do not account for temporal dependencies [354]. How-
ever, besides the standard methods, several possibilities de-
signed for sequential data exist in the literature [355], one
of them is the attention mechanism which we found to be
frequently utilized for such problems The only other approach
identified in this review that explicitly tackles the problem of
time dependencies is presented by [228]. They learn regression
coefficients per feature, time point, and location. Addition-
ally, the spatiotemporal dependencies are disentangled with a
random effect component where the latent variables follow a
temporal Markovian process. This approach is evaluated for air
quality estimation where the regression coefficients are used
to reveal the temporal importance of various meteorological
and landcover features. Besides these approaches, the other
works typically rely on approaches like backpropagation that
do not explicitly consider temporal relations. For instance, Liu
et al. aggregate saliency maps for individual time steps into
cyclical saliency maps to identify sea surface temperature areas
relevant for river flow prediction [256]. These are computed
as an average of individual saliency maps over periods of
months, seasons, or calendar years. The authors argue that
such aggregation is meaningful in the climate context as it
allows summarizing the spatial importance across different
time frames while improving the robustness towards gradient
fluctuation and noise. A similar strategy is also followed in
[149], where a CNN is used on the temporal dimension and
CAM is applied for attribution purposes.
3) Towards interpretable Deep Neural Networks: The ini-
tial research for interpreting NNs focused on developing
saliency map methods. In Section V-A, we have seen that this
trend is also prominent in RS, as methods like Grad-CAM
are amongthe most utilized xAI approaches for explaining
DL models. The saliency maps highlight the relevant features
without providing additional insights into how these features
are used by the model [356]. Moreover, these highlighted
features usually do not correspond to high-level concepts
that humans can easily interpret [61]. In recent years, these
limitations have been addressed with alternative explanation
paradigms such as concept-based explanations and intrinsically
interpretable DNNs [357].
The concept-based explanations described in Section
IV-A3b enable global interpretation of the model workings
in terms of intuitive high-level concepts understandable to
humans. Amid the growing number of these approaches in
CV [61, 62, 356, 358, 359], we did not identify any works
in RS. The lack of usage of these approaches originates from
the properties of RS data described in Section VI-B2. First,
the concepts defined in CV range from simple primitives
like colors and textures to more complex visual patterns like
object parts [61, 62]. While simple visual primitives can also
be used as concepts in RS, defining complex visual patterns
requires modeling the topological relations inherent in RS data.
In addition, concepts in CV are typically extracted from an
external probe dataset and a recent study [360] recommends
using a probe dataset with a similar distribution as the dataset
used for the learning task. Hence, to ensure faithful concept
explanations in RS, the probe dataset should conform to the
topological relations of the dataset on which the ML model
was trained.
Even though the concept-based explanations can offer more
intuitive explanations than the saliency maps, they are still
post-hoc methods, and as already described in Section IV-C,
they are not always faithful to the model workings. More-
over, these methods often approximate the underlying model
behavior and do not make the model more transparent. While
mainly simple and easily understandable ML models are meant
with interpretable models, we want to emphasize the efforts
of trying to make DNNs inherently interpretable. Current
xAI literature is going in that direction, either by introducing
layers in the neural network, like the attention mechanism that
can offer insights into the relevant features, or by designing
intrinsically interpretable DNNs.
While transformers were developed for one-dimensional
sequential inputs, their CV counterpart, ViTs [92], adapt their
structure for the two dimensions of an image by splitting
the image into smaller parts. Spatio-temporal inputs are of-
ten encountered in the EO domain. The Earthformer [361]
tackles spatiotemporal inputs through cuboid attention, i.e.,
by applying self-attention on input tensors decomposed into
local cuboids, and then attending to global cuboids which
summarize the overall status of the system. Another ViT
adaption to the RS domain is shown in [362], where a temporal
transformer encoder followed by a spatial one is used to
process all patches composing the time series. Additional ideas
for ViTs relying on related concepts and suitable for RS space-
time data can be found in [363–366]. Further exploration of the
attention mechanisms in these new transformer architectures is
still missing in the EO literature and could be a promising di-
rection for latent space analysis-based xAI. It should be noted
that attention mechanisms are not solely used in transformers
but also in CNNs and RNNs [121, 191, 209, 240, 260, 267,
268, 290].
Other intrinsically interpretable DNN approaches for gen-
erating ante-hoc explanations recently explored in RS are
prototype networks, BagNets, and Graph Neural Networks.
As described in Section IV-A3c, prototype neural networks
enforce a reasoning process that classifies input examples
based on their similarity to prototypical parts of images of
a given class. In this review, we have identified the following
approaches [168, 200] that have already been described in
Sections V-A2 and V-A1, respectively. A recent work intro-
duces a more interpretable CNN model, the BagNet [87],
which, in contrast to the traditional CNN, is only applied to
a small and local receptive field. Therefore, the activations
for each class in each receptive field are combined to provide
a prediction by addition at the image level. The magnitude
of the receptive fields’ activations corresponds to the impor-
tance of the features, and an attribution map can be directly
compiled from the activations. Examples of this methodology
applied to SAR vehicle detection are in [287, 292]. A similar
approach integrates NNs in GAMs for landslide susceptibility
prediction, where the NNs approximates the GAM’s spline
functions. The model enhances its interpretability but leverages
the capabilities of a NN [162]. Another study equipped a
DL model with an interpretable-by-design component, a linear
regression model is jointly training onto the same targets via
backpropagation [122].
In contrast to the NN approaches, graphs provide an in-
tuitive representation of many real-world problems regard-
ing objects and their relations. GNNs emerged as a pop-
ular paradigm for learning high-level object representation
by message-passing from the object neighbors [367]. This
reasoning process enables GNNs to represent complex patterns
of relationships between objects and generalizes their applica-
bility to arbitrary geometric structures as opposed to CNN,
which operate on two-dimensional grids (images) and one-
dimensional sequences (text) [368]. Moreover, Graph Convo-
lutional Networkss (GCNs) [369], which perform message-
passing with the convolution operation, enable efficient semi-
supervised learning in scenarios where not all objects in the
graph are labeled. These properties have fostered the usage of
GNNs in RS, especially in hyperspectral image classification
tasks. Concretely, to overcome the limitation of the popular
CNN approaches to capture the topological relations and the
irregular object shapes inherent to hyperspectral data, vari-
ous GCN approaches have been proposed [370–372], which
demonstrate the potential of using GNNs for improving hyper-
spectral image classification results. Recent works have also
tackled the scarcity of labels challenge with approaches based
on GCN, such as [373, 374]. Although GNNs produce an
intuitive representation of the problem, they are still considered
black-box models [375]. A promising research direction is
adopting the novel xAI methods for interpreting GNNs [376]
to the approaches used in RS.
4) Restricted availability of labeled data: The high number
of active airborne platforms like Sentinel-2 generates vast
amounts of data. However, only a small portion of this data
can be labeled. The lack of labels represents a challenge in the
typical knowledge extraction pipeline in RS, which is based on
training supervised ML algorithms on a labeled dataset. Hence,
one recently popular approach to tackle this issue is to apply
the Self-Supervised Learning (SSL) paradigm. SSL allows
learning latent representations from the unlabeled data. These
are useful in efficiently solving the downstream learning task
for which only limited labels are available. SSL is achieved
by optimizing a NN to solve a pretext task with pseudo-
labels [377]. We have identified a promising research direction
for improving model interpretability: combining the traditional
ML modeling approaches with the recent DL approaches. A
pretext task for the problem of SAR sea ice classification is
proposed in [270], where the pseudo-labels are derived from a
mixture of LDA topics. Topics are based on physical scattering
properties extracted from unlabeled SAR images. Thus, the
model is able to provide an interpretable representation of
the sea-ice types in terms of their physical properties (e.g.,
water bodies and sea floating ice are represented with a similar
set of properties that match the actual semantic definition
of these classes). SSL pretraining also preserves the physics
consistencythat most explanation
outcomes are not evaluated quantitatively and over 90% of
the authors only provide anecdotal evidence. Our work also
includes a table with concise information about each paper
included in our study. Furthermore, we discuss the alignment
of the usage of xAI in RS with standard practices in xAI
and the evaluation of xAI in RS. As such, this review aims to
assist users in the field of EO for the usage of xAI. Finally, the
combination of xAI with related fields, the unique properties
of RS data, and the interpretability of Deep Learning (DL)
and their implication into the lack of labels in RS are the
challenges and limitations we identified while conducting this
review.
In summary, this review provides the following main con-
tributions:
• First, we present an overview and categorization of cur-
rent xAI methods.
• Second, we summarize the state-of-the-art (SOTA) xAI
approaches in RS through the analysis of a comprehensive
literature database.
• Third, we identify objectives and practices for evaluating
xAI methods in RS.
• Finally, we discuss challenges, limitations, and future
directions for xAI in RS.
The remainder of this review is structured as follows. We
discuss the related work and draw the distinction between
existing xAI reviews and ours in Section II. The method-
ological approach for our systematic review and our research
questions are introduced in Section III. Then, in Section IV,
the taxonomy of xAI is clarified, a categorization of the
methods is developed, and the most common xAI methods are
explained. The results of this review are presented in Section
V. They show the current state of xAI in the RS literature,
trends, and novel approaches for RS tasks. Last but not least,
Section VI discusses these results and how they align with
the recommended practices in xAI. Furthermore, connections
to current challenges and limitations are drawn. In the end,
we summarize the main takeaways and conclusions from our
review and present an outlook.
II. RELATED WORK
Numerous resources for either the field of xAI [10, 18, 19] or
ML for EO applications [1, 20] are available in the literature.
In contrast, to the best of our knowledge, in the overlapping
area of these two fields, there are only two reviews [13, 17].
Gevaert aims to summarize the existing works of xAI in EO
and addresses the xAI usage from a regulatory and societal
perspective, discussing the requirements and type of xAI that
is needed in EO from policy, regulation, and politics [13].
However, only a small literature database is used in this study
and the author relies on a high-level categorization of the xAI
methods. As such, this work does not provide a comprehensive
overview of current xAI approaches in EO. Further, the work
of Roscher et al. categorizes the identified works in xAI
according to the general challenges in the bio- and geosciences
[17]. Their categorization of xAI properties and their emphasis
on considering expert knowledge constitute two highlights of
this review. Furthermore, the presented challenges are still
faced by researchers today. Yet, they do not use a broad
literature database to extensively cover the xAI approaches
used in EO. There also exist xAI reviews for certain EO tasks
or perspectives [14–16]. Still, they do not comprehensively
summarize the work done in the broad research field of
RS in EO because they focus only on specific parts of the
literature. Leluschko and Tholen conduct a review on the
stakeholders and goals within human-centered xAI applied in
RS [14]. Their findings indicate an underrepresentation of non-
developer stakeholders in this area. Hall et al. [15] review
DL methods and investigate to which degree the methods can
explain human wealth or poverty from satellite imagery. Xing
and Sieber [16] focus on xAI in conjunction with Deep Neural
Networks (DNNs) that incorporate geographic structures and
knowledge. They discuss three challenges when applying xAI
to geo-referenced data: challenges from xAI, geospatial AI,
and geosocial applications. Based on a short use case on land
use classification and relying on SHAP explanations, they
show that the geometry, topology, scale, and localization are of
great importance. Finally, a group at Colorado State University
published a survey on their work using xAI for climate and
weather forecasting [21].
To overcome these shortcomings, we approach xAI in EO
systematically and provide an extensive literature database,
resulting in a comprehensive summary of the current literature.
This work is characterized by our interest in the application
and usage of xAI, putting a special emphasis on RS sensors,
while others focus on the application of geological features,
natural sciences, or social implications. Not only do we present
an overview of the current challenges in the field, but we
also highlight the state-of-the-art methods for tackling these
challenges. We believe this could provide valuable insight into
current limitations faced in the field. Compared to the existing
literature, we look at the topic from a technical perspective
without reflecting on regulatory or ethical implications orig-
inating from integrating xAI into the field of EO. Because
of the contrasting approaches relative to xAI, this review
excludes works from related and overlapping domains, such
as physics-aware ML, uncertainty quantification, or causal
inference. Instead, we refer the reader to overviews in their
fields: [22], [23], and [24, 25], respectively.
III. RESEARCH METHOD
In this review, we aim for transparency and reproducibility of
our work by following the PRISMA scheme [26]. We apply the
appropriate PRISMA requirements for our field to provide a
transparent, complete, and trustworthy review. Since literature
search engines, such as Google Scholar, do not provide the
same search results for all users (the results depend on the
geographic location and time) [27], we avoided the use of
such search engines and relied on databases where the results
can be reproduced. In addition to the PRISMA guidelines, we
provide the full search results and the exclusion criteria in the
supplementary material to further improve transparency and
reproducibility.
A. Research Objective
The final goal of this survey is to summarize and compare
the work conducted in the field of xAI and EO with a focus
on RS. Furthermore, we want to show where and how the xAI
methods are adopted, why they are used, and which problems
arise when applying them. In particular, we will attempt to
answer the following research questions:
• RQ1 - Which explainable AI approaches have been used
and which methods have been developed in the literature
for EO tasks?
• RQ2 - How are xAI explanations analyzed, interpreted,
and evaluated?
• RQ3 - What are the objectives and findings of using xAI
in RS?
• RQ4 - How do the utilized xAI approaches in RS align
with the recommended practices in the field of xAI?
• RQ5 - What are the limitations, challenges, and new
developments of xAI in RS?
B. Search procedure
The search query consists of two major parts: keywords related
to xAI and keywords related to EO. All the nested keywords
are connected via an OR operator, while the two parts are
connected via the AND operator. Due to the interchangeably
used taxonomy in both fields, we added additional keywords
to the generally known terms to receive as many relevant
papers as possible, attempting not to excessively increase the
false positive rate. For example, we included specific types of
remote sensing sensors and used wildcards to cover different
ways authors might refer to xAI.[
Earth observation OR remote sensing OR earth
science OR
(
(satellite OR aerial OR airborne
OR spaceborne OR radar) AND (image OR data))
OR LiDAR OR SAR OR UAV OR Sentinel OR
Landsat OR MODIS OR gaofen OR ceres
]
AND[
xai OR
(
(interpret* OR explain*) AND (deep
learning OR machine learning OR artificial
intelligence OR dl OR ml OR ai OR model)
)]
This general search query was adapted to the differentof the features throughout the NN layers. As
a result, the different-looking images of the same class are
positioned close in the latent space, leading to improved
classification results compared to a supervised CNN. Ma
et al. leverage contrastive learning to create prototypes for
landcover classification [271]. Their online learning approach
fuses the images with their enhanced counterparts at different
resolutions. After a ResNet encodes them into a feature space,
they get mapped into a unit sphere where the prototypes
are clustered. Then, the resulting codes of the prototypes
corresponding to the differently enhanced images are swapped
to compute the loss. Another unsupervised prototype approach
is provided in [150].
5) Lack of standardized and objective evaluation of xAI:
As mentioned in Section VI-A, the qualification of the in-
terpretability is essential for xAI approaches [84]. Contrary
to this, there is no definition of a sufficient explanation
or interpretation. Furthermore, no standardized and objective
evaluation for xAI methods has been established, posing a
challenge for potential partitioners outside of the xAI domain.
Although metrics and tools for the evaluation exist, incor-
porating xAI into a ML pipeline would mean acquiring the
domain knowledge from xAI to provide a good evaluation for
the methods used by a RS partitioner. An alternative approach
to the evaluation metrics is the conduction of user studies.
These studies evaluate how useful the interpretation appears
to the participants. However, these user studies can introduce
subjectivity because preferences for interpretability can vary
between different groups. Hence, conducting a well-designed
user study involves a lot of effort, costs, and domain knowl-
edge, which again would need to be acquired by the partitioner.
Even though frameworks for xAI have been developed recently
[378, 379], they are tailored towards a specific type of method
or problem [378]. Others do focus more on the life-cycle of the
systems than the evaluation [379]. In summary, the evaluation
of xAI lacks a standardized methodology, potentially limiting
non-experts applying the methods. This circumstance might
have contributed to the large number of anecdotal evidence
we encountered in this review.
VII. CONCLUSION
This paper provides a detailed overview of the state-of-the-
art of xAI in RS by conducting a systematic review of
the existing work in the field. First, we collected a large
set of publications by executing extensive search queries in
the established literature databases for the last six years.
Subsequently, we introduced a categorization for the existing
xAI methods to structure these works. Our analysis reveals that
a substantial amount of work is motivated by the assessment
of the trustworthiness of the ML models for traditional EO
tasks like landcover mapping and agricultural monitoring.
More recently, xAI has been increasingly utilized for the
discovery of scientific insights for critical EO problems related
to climate change, extreme events, or urbanization. Although
dominantly established xAI methods like SHAP or Grad-CAM
are frequently used, we observe the increased development
of adapted xAI methods to capture the specifics of RS data.
Further, we summarize the works combining xAI with other
fields, like physics and uncertainty, that produce scientifically
sound ML models, ultimately enhancing the quality of the
extracted explanations.
These highlights clearly illustrate that xAI in RS is a young
field with high potential to augment the ML knowledge extrac-
tion process. At the same time, comparing how the identified
practices in this review relate to the latest developments in
xAI and the current limitations in EO gives a hint to the
fundamental challenges that need to be addressed in future
work. One research direction of paramount importance is the
development of interpretable DNNs to address the shortcom-
ings of the widely-used saliency methods in RS. Further,
as most studies conduct only anecdotal evaluation, verifying
the reliability of the xAI outputs supports future work on
quantitative evaluation and user studies. Moreover, the adapted
methods indicate that the traditional xAI approaches do not
conform to the properties of RS data. Therefore, we encourage
future work on developing models to address those drawbacks
related to scale, topology, and temporal dependencies in RS
data. Another challenge is the lack of labeled data, which
is currently tackled by combining SSL with xAI to design
approaches that outperform supervised ML models, addition-
ally offering intuitive model interpretability. Summarizing our
contributions, we hope that the insights provided in this review
enable the researchers to better understand the state-of-the-art
in this field and promote the development of novel methods
by tackling the research directions proposed.
APPENDIX
A. Search queries
Due to the implementation of the search in the databases,
they cannot handle deep and nested search queries. For this
reason, we repeated different versions of different words, like
AI in ”explain* AI” and ”interpret* AI”.
1) IEEE: In the IEEE search, abstract, title, and keywords
have the search terms ”Abstract”, ”Document Title”, and
”Index Terms”, respectively. Unfortunately, it is not possible to
search with more than 8 wildcards (”*”) in one search query.
Hence, we split the queries into different parts and merge the
results in a second step. The search query cannot include the
year or the language, so the results were filtered in the masks
of their website.
("Abstract":earth observation OR "Abstract":
remote sensing OR "Abstract":earth science OR "
Abstract":satellite data OR "Abstract":satellite
image OR "Abstract":aerial image OR "Abstract":
aerial data OR "Abstract":airborne image OR "
Abstract":airborne data OR "Abstract":radar
image OR "Abstract":radar data OR "Abstract":
spaceborne image OR "Abstract":spaceborne data
OR "Abstract":LiDAR OR "Abstract":SAR OR "
Abstract":UAV OR "Abstract":Sentinel OR "
Abstract":Landsat OR "Abstract":MODIS OR "
Abstract":gaofen OR "Abstract":ceres) AND ("
Abstract":"xai" OR "Abstract":"interpret* model"
OR "Abstract":"interpret* deep learning" OR "
Abstract":"interpret* machine learning" OR "
Abstract":"interpret* artificial intelligence"
OR "Abstract":"interpret* dl" OR "Abstract":"
interpret* ml")
("Abstract":earth observation OR "Abstract":
remote sensing OR "Abstract":earth science OR "
Abstract":satellite data OR "Abstract":satellite
image OR "Abstract":aerial image OR "Abstract":
aerial data OR "Abstract":airborne image OR "
Abstract":airborne data OR "Abstract":radar
image OR "Abstract":radar data OR "Abstract":
spaceborne image OR "Abstract":spaceborne data
OR "Abstract":LiDAR OR "Abstract":SAR OR "
Abstract":UAV OR "Abstract":Sentinel OR "
Abstract":Landsat OR "Abstract":MODIS OR "
Abstract":gaofen OR "Abstract":ceres) AND ("
Abstract":"interpret* AI" OR "Abstract":"explain
* model" OR "Abstract":"explain* deep learning"
OR "Abstract":"explain* machine learning" OR "
Abstract":"explain* artificial intelligence" OR
"Abstract":"explain* dl" OR "Abstract":"explain*
ml" OR "Abstract":"explain* AI")
("Index Terms":earth observation OR "Index Terms
":remote sensing OR "Index Terms":earth science
OR "Index Terms":satellite data OR "Index Terms
":satellite image OR "Index Terms":aerial image
OR "Index Terms":aerial data OR "Index Terms":
airborne image OR "Index Terms":airborne data OR
"Index Terms":radar image OR "Index Terms":
radar data OR "Index Terms":spaceborne image OR
"Index Terms":spaceborne data OR "Index Terms":
LiDAR OR "Index Terms":SAR OR "Index Terms":UAV
OR "Index Terms":Sentinel OR "Index Terms":
Landsat OR "Index Terms":MODIS OR "Index Terms":
gaofen OR "Index Terms":ceres) AND ("Index Terms
":"xai" OR "Index Terms":"interpret* model" OR "
Index Terms":"interpret* deep learning" OR "
Index Terms":"interpret* machine learning" OR "
Index Terms":"interpret* artificial intelligence
" OR"Index Terms":"interpret* dl" OR "Index
Terms":"interpret* ml")
("Index Terms":earth observation OR "Index Terms
":remote sensing OR "Index Terms":earth science
OR "Index Terms":satellite data OR "Index Terms
":satellite image OR "Index Terms":aerial image
OR "Index Terms":aerial data OR "Index Terms":
airborne image OR "Index Terms":airborne data OR
"Index Terms":radar image OR "Index Terms":
radar data OR "Index Terms":spaceborne image OR
"Index Terms":spaceborne data OR "Index Terms":
LiDAR OR "Index Terms":SAR OR "Index Terms":UAV
OR "Index Terms":Sentinel OR "Index Terms":
Landsat OR "Index Terms":MODIS OR "Index Terms":
gaofen OR "Index Terms":ceres) AND ("Index Terms
":"interpret* AI" OR "Index Terms":"explain*
model" OR "Index Terms":"explain* deep learning"
OR "Index Terms":"explain* machine learning" OR
"Index Terms":"explain* artificial intelligence
" OR "Index Terms":"explain* dl" OR "Index Terms
":"explain* ml" OR "Index Terms":"explain* AI")
("Document Title":earth observation OR "Document
Title":remote sensing OR "Document Title":earth
science OR "Document Title":satellite data OR "
Document Title":satellite image OR "Document
Title":aerial image OR "Document Title":aerial
data OR "Document Title":airborne image OR "
Document Title":airborne data OR "Document Title
":radar image OR "Document Title":radar data OR
"Document Title":spaceborne image OR "Document
Title":spaceborne data OR "Document Title":LiDAR
OR "Document Title":SAR OR "Document Title":UAV
OR "Document Title":Sentinel OR "Document Title
":Landsat OR "Document Title":MODIS OR "Document
Title":gaofen OR "Document Title":ceres) AND ("
Document Title":"xai" OR "Document Title":"
interpret* model" OR "Document Title":"interpret
* deep learning" OR "Document Title":"interpret*
machine learning" OR "Document Title":"
interpret* artificial intelligence" OR "Document
Title":"interpret* dl" OR "Document Title":"
interpret* ml")
("Document Title":earth observation OR "Document
Title":remote sensing OR "Document Title":earth
science OR "Document Title":satellite data OR "
Document Title":satellite image OR "Document
Title":aerial image OR "Document Title":aerial
data OR "Document Title":airborne image OR "
Document Title":airborne data OR "Document Title
":radar image OR "Document Title":radar data OR
"Document Title":spaceborne image OR "Document
Title":spaceborne data OR "Document Title":LiDAR
OR "Document Title":SAR OR "Document Title":UAV
OR "Document Title":Sentinel OR "Document Title
":Landsat OR "Document Title":MODIS OR "Document
Title":gaofen OR "Document Title":ceres) AND ("
Document Title":"interpret* AI" OR "Document
Title":"explain* model" OR "Document Title":"
explain* deep learning" OR "Document Title":"
explain* machine learning" OR "Document Title":"
explain* artificial intelligence" OR "Document
Title":"explain* dl" OR "Document Title":"
explain* ml" OR "Document Title":"explain* AI")
2) Scopus: Scopus has the most powerful search engine for
its database. Therefore, it was possible to specify the year of
the publication and the language directly in the search query.
The search term for abstract, title, and keywords is ”TITLE-
ABS-KEY”.
TITLE-ABS-KEY((earth observation) OR (remote
sensing) OR (earth science) OR (satellite data)
OR (satellite image) OR (aerial image) OR (
aerial data) OR (airborne image) OR (airborne
data) OR (radar image) OR (radar data) OR (
spaceborne image) OR (spaceborne data) OR (LiDAR
) OR (SAR) OR (UAV) OR (Sentinel) OR (Landsat)
OR (MODIS) OR (gaofen) OR (ceres)) AND TITLE-ABS
-KEY("xai" OR "interpret* model" OR "interpret*
deep learning" OR "interpret* machine learning"
OR "interpret* artificial intelligence" OR "
interpret* dl" OR "interpret* ml" OR "interpret*
AI" OR "explain* model" OR "explain* deep
learning" OR "explain* machine learning" OR "
explain* artificial intelligence" OR "explain*
dl" OR "explain* ml" OR "explain* AI") AND (
PUBYEAR > 2016 ) AND ( LIMIT-TO ( LANGUAGE,"
English" ) )
3) Springer: The Springer link search could not be done
only on abstract, title, and keywords. Hence, a full-text search
was done. The search query cannot include the year or the
language, so the results were filtered in the masks of their
website.
(earth observation OR remote sensing OR earth
science OR satellite data OR satellite image OR
aerial image OR aerial data OR airborne image OR
airborne data OR radar image OR radar data OR
spaceborne image OR spaceborne data OR LiDAR OR
SAR OR UAV OR Sentinel OR Landsat OR MODIS OR
gaofen OR ceres) AND ("xai" OR "interpret* model
" OR "interpret* deep learning" OR "interpret*
machine learning" OR "interpret* artificial
intelligence" OR "interpret* dl" OR "interpret*
ml" OR "interpret* AI" OR "explain* model" OR "
explain* deep learning" OR "explain* machine
learning" OR "explain* artificial intelligence"
OR "explain* dl" OR "explain* ml" OR "explain*
AI")
4) Search query, ML in EO: This search query was exe-
cuted in all databases for comparison purposes, its results are
shown in Figure 1.[
earth observation OR remote sensing OR earth
science OR
(
(satellite OR aerial OR airborne OR
spaceborne OR radar) AND (image OR data)
)
OR
LiDAR OR SAR OR UAV OR Sentinel OR Landsat OR
MODIS OR gaofen OR ceres
]
AND[
deep learning OR machine learning OR artificial
intelligence OR dl OR ml OR ai
]
B. Models, xAI categories and methods
Figure 11 shows the number of times xAI categories,
methods, and models are mentioned throughout this review.
C. Glossary
We have categorized related EO tasks into groups to provide
a better overview. Here, we provide a glossary of these groups
and the tasks they include.
Agricultural Monitoring: All crop-related tasks, like crop
yield prediction, crop type classification, irrigation
scheme classification, and crop lodging detection.
Atmosphere Monitoring: The prediction of atmospheric
phenomena, like air quality, aerosol optical depth, and
dust storm indices.
GB (44)
GAM (2)
GAN (2)
FNN (23)
Ensemble (32)
Rules (2)
Cubist (1)
LDA (4)
DT (3)
RF (69)
Transformer (12)
GLM (11)
CapsuleNet (1)
VAE (1)
NB (1)
CNN (137)
kNN (2)
SVM (8)
GP (5)
LSTM (25)
EBM (2)
GAN (1)
Gradient-variant (25)
LIME (16)
LRP (6)
Other (13)
Model Translation (3)
Backpropagation (104)
ALE (5)
Deconvolution (2)
Model Association (1)
Occlusion (15)
Rules (3)
LDA (3)
GP (1)
Attention (18)
Interpretable by Design (17)
Counterfactuals (1)
Example-based (1)
Rule Extraction (3)
Explanation Association (5)
IG (12)
CAM-variant (52)
GLM (5)
Joint Training (10)
Feature Selection (20)
MDI (18)
PDP (15)
Prototype (4)
PFI (30)
WIK (1)
EBM (2)
Perturbation (71)
SHAP (101)
Activation Assessment (25)
Local Approximation (117)
DeepLift (4)
Embedding Space (43)
GAM (1)
Model xAI Category xAI Method
Fig. 11. Number of times xAI categories, methods, and models are mentioned
in the identified literature.
Building Mapping: All tasks related to buildings and urban
structures, like building footprint classification and build-
ing damage mapping.
Ecosystem Interactions: All interactions of the ecosystem
with other systems, e.g., the atmosphere and the hydro-
sphere. This includes the ecosystem CO2 exchange or the
sun-induced fluorescence prediction.
Human Environment Interaction: The monitoring of hu-
man structures and the environment, like human footprint
estimation, socioeconomic status estimation, or well-
being prediction.
Hydrology Monitoring: Tasks related to hydrology, like
runoff forecasting, water quality, streamflow prediction,
and water segmentation, but excluding floods.
Landcover Mapping: The most common EO task includes
mainly landcover classification but also related tasks like
slum mapping.
Natural Hazard Monitoring: Monitoring of natural hazards,
like landslides, wildfires, floods, earthquakes, and vol-
canos.
Soil Monitoring: Monitoring soil properties, like soil texture,
respiration, moisture, and salinity.
Surface Temperature Prediction: The prediction of the
Earth’s surface temperature.Target Mapping: Tasks related to the mapping of specific
targets, like vehicles and objects.
Vegetation Monitoring: Monitoring of vegetation, excluding
crops, like vegetation regeneration, tree monitoring, tree
classification, and tree mapping.
Weather Climate Prediction: Forecasting of weather and
climate variables, like precipitation, temperature, and
drought.
Other: All the tasks which did not fit into the other groups.
This includes change detection, urban mobility, sea ice
classification, mosquito modeling, and satellite product
quality tasks.
LIST OF ACRONYMS
aCNN CNN with attention.
AI Artificial Intelligence.
ALE Accumulated Local Effects.
aLSTM LSTM with attention.
ASC attribute scattering center.
BP Backpropagation.
CAM Class Activation Mapping.
CAP Common Agricultural Policy.
CNN Convolutional Neural Network.
CV Computer Vision.
DeepLIFT Deep Learning Important FeaTures.
DEM Digital Elevation Model.
DL Deep Learning.
DNN Deep Neural Network.
DSM Digital Surface Model.
DTD Deep Taylor Decomposition.
EBM Explainable Boosting Machines.
EG Expected Gradients.
EO Earth Observation.
FLS Fuzzy Logic System.
FNN Feed Forward Neural Network.
GAM Generalized Additive Model.
GAN Generative Adversarial Network.
GB Gradient Boosting.
GCN Graph Convolutional Networks.
GFFS Grouped Forward Feature Selection.
GLM Generalized Linear Model.
GNN Graph Neural Networks.
GP Gaussian Process.
Grad-CAM Gradient-weighted Class Activation Mapping.
GRU Gated Recurrent Unit.
I*G Input*Gradient.
IG Integrated Gradients.
IML interpretable Machine Learning.
KL Kullback-Leiber.
kNN k-Nearest-Neighbor.
LDA Latent Dirichlet Allocation.
LiDAR Light Detection and Ranging.
LIME Local Interpretable Model-agnostic Explanation.
LR Linear Regression.
LRP Layer-wise Relevance Propagation.
LSTM Long Short-Term Memory neural network.
MDI Mean Decrease in Impurity.
ML Machine Learning.
MLP Multilayer Perceptron.
MoRef Most Relevant First.
NDVI Normalized Difference Vegetation Index.
NN Neural Network.
OOD Out-Of-Distribution.
OWA Ordered Weighted Averaging.
PCA Principal Component Analysis.
PDP Partial Dependence Plot.
PFI Permutation Feature Importance.
PolSAR Polarimetric SAR.
RAM Regression Activation Mapping.
RF Random Forest.
RGB Red-Green-Blue.
RNN Recurrent Neural Network.
RS Remote Sensing.
SAR Synthetic Aperture Radar.
SHAP SHapley Additive exPlanations.
SMLP MLP with a presingle-connection layer.
SmoothGrad Smooth Gradient.
SOTA state-of-the-art.
SSL Self-Supervised Learning.
SST Sea Surface Temperature.
SVM Support Vector Machine.
t-SNE t-distributed Stochastic Neighbor Embedding.
TCAV Testing with Concept Activation Vectors.
TDA topological data analysis.
UAV Unmanned Aerial Vehicles.
UMAP Uniform Manifold Approximation and Projection.
VAE Variational Autoencoder.
ViT Vision Transformer.
xAI explainable AI.
XGBoost eXtreme Gradient Boosting.
REFERENCES
[1] G. Camps-Valls, D. Tuia, X. X. Zhu, and M. Re-
ichstein, Deep Learning for the Earth Sciences: A
Comprehensive Approach to Remote Sensing, Climate
Science, and Geosciences. 2021, vol. 1, ISBN: 1-119-
64614-6.
[2] X. X. Zhu et al., “Deep Learning in Remote Sensing:
A Comprehensive Review and List of Resources,”
IEEE Geoscience and Remote Sensing Magazine,
vol. 5, no. 4, pp. 8–36, Dec. 2017, ISSN: 21686831.
DOI: 10.1109/MGRS.2017.2762307.
[3] M. Reichstein et al., “Deep learning and process
understanding for data-driven Earth system science,”
Nature, vol. 566, no. 7743, pp. 195–204, Feb. 2019,
ISSN: 1476-4687. DOI: 10.1038/s41586-019-0912-1.
(visited on 03/14/2023).
[4] H. Ruschemeier, “Ai as a challenge for legal
regulation–the scope of application of the artificial
intelligence act proposal,” in ERA Forum, Springer,
2023, pp. 1–16.
[5] S. P. TerKonda and E. M. Fish, “Artificial intelli-
gence viewed through the lens of state regulation,”
Intelligence-Based Medicine, p. 100 088, 2023.
[6] D. Tuia, R. Roscher, J. D. Wegner, N. Jacobs, X. Zhu,
and G. Camps-Valls, “Toward a Collective Agenda
on AI for Earth Science Data Analysis,” IEEE Geo-
science and Remote Sensing Magazine, vol. 9, no. 2,
pp. 88–104, Jun. 2021, ISSN: 2168-6831. DOI: 10 .
1109/MGRS.2020.3043504.
[7] E. Rolf, K. Klemmer, C. Robinson, and H. Kerner,
Mission Critical – Satellite Data is a Distinct Modality
in Machine Learning, Feb. 2024. arXiv: 2402.01444
[cs]. (visited on 02/07/2024).
[8] M. F. Goodchild, “Scale in GIS: An overview,” Geo-
morphology, Scale Issues in Geomorphology, vol. 130,
no. 1, pp. 5–9, Jul. 2011, ISSN: 0169-555X. DOI:
10 . 1016 / j . geomorph . 2010 . 10 . 004. (visited on
05/31/2023).
[9] T. Lillesand, R. W. Kiefer, and J. Chipman, Remote
sensing and image interpretation. John Wiley & Sons,
2015.
[10] G. Ras, N. Xie, M. van Gerven, and D. Doran, “Ex-
plainable Deep Learning: A Field Guide for the Unini-
tiated,” Journal of Artificial Intelligence Research,
vol. 73, pp. 329–396, Jan. 2022, ISSN: 1076-9757.
DOI: 10.1613/jair.1.13200. (visited on 02/14/2022).
[11] D. Minh, H. X. Wang, Y. F. Li, and T. N. Nguyen,
“Explainable artificial intelligence: A comprehensive
review,” Artificial Intelligence Review, pp. 1–66, 2022.
[12] T. Speith, “A review of taxonomies of explainable
artificial intelligence (xai) methods,” in 2022 ACM
Conference on Fairness, Accountability, and Trans-
parency, 2022, pp. 2239–2250.
[13] C. M. Gevaert, “Explainable AI for earth observation:
A review including societal and regulatory perspec-
tives,” International Journal of Applied Earth Obser-
vation and Geoinformation, vol. 112, p. 102 869, Aug.
2022, ISSN: 1569-8432. DOI: 10 . 1016 / j . jag . 2022 .
102869. (visited on 07/01/2022).
[14] C. Leluschko and C. Tholen, “Goals and Stake-
holder Involvement in XAI for Remote Sensing: A
Structured Literature Review,” in Artificial Intelli-
gence XL, ser. Lecture Notes in Computer Science,
Cham: Springer Nature Switzerland, 2023, pp. 519–
525, ISBN: 978-3-031-47994-6. DOI: 10.1007/978-3-
031-47994-6 47.
[15] O. Hall, M. Ohlsson, and T. Rögnvaldsson, “A review
of explainable AI in the satellite data, deep machine
learning, and human poverty domain,” Patterns, vol. 3,
no. 10, p. 100 600, Oct. 2022, ISSN: 2666-3899. DOI:
10.1016/j.patter.2022.100600.
[16] J. Xing and R. Sieber, “The challenges of integrating
explainable artificial intelligence into GeoAI,” Trans-
actions in GIS, vol. 27, no. 3, pp. 626–645, 2023, ISSN:
1467-9671. DOI: 10.1111/tgis.13045.
[17] R. Roscher, B. Bohn, M. Duarte, and J. Garcke,
“Explain it to me-facing remote sensing challenges
in the bio-and geosciences with explainable machine
learning,” ser. ISPRS Annals of the Photogrammetry,
Remote Sensing and Spatial Information Sciences,
vol. 5, 2020, pp. 817–824. DOI: 10.5194/isprs-Annals-
V-3-2020-817-2020.
[18] R. Roscher, B. Bohn, M. F. Duarte, and J. Garcke, “Ex-
plainable Machine Learning for Scientific Insights and
Discoveries,” IEEE Access, vol. 8, pp. 42 200–42 216,
2020, ISSN: 21693536. DOI: 10.1109/ACCESS.2020.
2976199.
[19] C. Molnar, Interpretable Machine Learning. A Guide
for Making Black Box Models Explainable. 2019,
247 pp. [Online]. Available: https://christophm.github.
io/interpretable-ml-book.
[20] A. Karpatne, I. Ebert-Uphoff, S. Ravela, H. A. Babaie,
and V. Kumar, “Machine Learning for the Geo-
sciences: Challenges and Opportunities,” IEEE Trans-
actions on Knowledge and Data Engineering, vol. 31,
no. 8, pp. 1544–1554, Aug. 1, 2019, ISSN: 15582191.
DOI: 10 . 1109 / TKDE . 2018 . 2861006. arXiv: 1711 .
04708.
[21] A. Mamalakis, I. Ebert-Uphoff, and E. Barnes, “Ex-
plainable artificial intelligence in meteorology and
climate science: Model fine-tuning, calibrating trust
and learning new science,” Lecture Notes in Computer
Science (including subseries Lecture Notes in Artifi-
cial Intelligence and Lecture Notes in Bioinformatics),
vol. 13200 LNAI, pp. 315–339, 2022. DOI: 10.1007/
978-3-031-04083-2 16.
[22] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P.
Perdikaris, S. Wang, and L. Yang, “Physics-informed
machine learning,” NatureReviews Physics, vol. 3,
no. 6, pp. 422–440, May 24, 2021, ISSN: 2522-5820.
DOI: 10.1038/s42254-021-00314-5.
https://doi.org/10.1109/MGRS.2017.2762307
https://doi.org/10.1038/s41586-019-0912-1
https://doi.org/10.1109/MGRS.2020.3043504
https://doi.org/10.1109/MGRS.2020.3043504
https://arxiv.org/abs/2402.01444
https://arxiv.org/abs/2402.01444
https://doi.org/10.1016/j.geomorph.2010.10.004
https://doi.org/10.1613/jair.1.13200
https://doi.org/10.1016/j.jag.2022.102869
https://doi.org/10.1016/j.jag.2022.102869
https://doi.org/10.1007/978-3-031-47994-6_47
https://doi.org/10.1007/978-3-031-47994-6_47
https://doi.org/10.1016/j.patter.2022.100600
https://doi.org/10.1111/tgis.13045
https://doi.org/10.5194/isprs-Annals-V-3-2020-817-2020
https://doi.org/10.5194/isprs-Annals-V-3-2020-817-2020
https://doi.org/10.1109/ACCESS.2020.2976199
https://doi.org/10.1109/ACCESS.2020.2976199
https://christophm.github.io/interpretable-ml-book
https://christophm.github.io/interpretable-ml-book
https://doi.org/10.1109/TKDE.2018.2861006
https://arxiv.org/abs/1711.04708
https://arxiv.org/abs/1711.04708
https://doi.org/10.1007/978-3-031-04083-2_16
https://doi.org/10.1007/978-3-031-04083-2_16
https://doi.org/10.1038/s42254-021-00314-5
[23] J. Gawlikowski et al., A Survey of Uncertainty in
Deep Neural Networks, Jan. 18, 2022. DOI: 10.48550/
arXiv.2107.03342. arXiv: 2107.03342 [cs, stat].
[Online]. Available: http://arxiv.org/abs/2107.03342
(visited on 02/01/2023).
[24] A. Perez-Suay and G. Camps-Valls, “Causal Inference
in Geoscience and Remote Sensing from Observational
Data,” IEEE Transactions on Geoscience and Remote
Sensing, vol. 57, no. 3, pp. 1502–1513, Mar. 2019,
ISSN: 01962892. DOI: 10.1109/TGRS.2018.2867002.
arXiv: 2012.05150. (visited on 03/16/2022).
[25] J. Runge et al., “Inferring causation from time series
in Earth system sciences,” Nature Communications,
vol. 10, no. 1, p. 2553, Jun. 2019, ISSN: 2041-1723.
DOI: 10 . 1038 / s41467 - 019 - 10105 - 3. (visited on
02/23/2023).
[26] M. J. Page et al., “PRISMA 2020 explanation and
elaboration: Updated guidance and exemplars for re-
porting systematic reviews,” BMJ, vol. 372, n160,
Mar. 29, 2021, ISSN: 1756-1833. DOI: 10.1136/bmj.
n160. pmid: 33781993.
[27] M. Gusenbauer and N. Haddaway, “Which Aca-
demic Search Systems are Suitable for Systematic Re-
views or Meta-Analyses? Evaluating Retrieval Qual-
ities of Google Scholar, PubMed and 26 other Re-
sources [OPEN ACCESS],” Research Synthesis Meth-
ods, vol. 11, pp. 181–217, Mar. 2020. DOI: 10.1002/
jrsm.1378.
[28] G. Vilone and L. Longo, Explainable Artificial Intelli-
gence: A Systematic Review, Oct. 2020. arXiv: 2006.
00093 [cs]. (visited on 05/04/2023).
[29] H. Hersbach et al., “The ERA5 global reanalysis,”
Quarterly Journal of the Royal Meteorological Society,
vol. 146, no. 730, pp. 1999–2049, 2020, ISSN: 1477-
870X. DOI: 10.1002/qj.3803.
[30] A. Adadi and M. Berrada, “Peeking Inside the Black-
Box: A Survey on Explainable Artificial Intelligence
(XAI),” IEEE Access, vol. 6, pp. 52 138–52 160, 2018,
ISSN: 2169-3536. DOI: 10 . 1109 / ACCESS . 2018 .
2870052.
[31] A. Holzinger, A. Saranti, C. Molnar, P. Biecek, and
W. Samek, “Explainable ai methods-a brief overview,”
in xxAI-Beyond Explainable AI: International Work-
shop, Held in Conjunction with ICML 2020, July 18,
2020, Vienna, Austria, Revised and Extended Papers,
Springer, 2022, pp. 13–38.
[32] W. Samek, G. Montavon, S. Lapuschkin, C. J. Anders,
and K.-R. Müller, “Explaining deep neural networks
and beyond: A review of methods and applications,”
Proceedings of the IEEE, vol. 109, no. 3, pp. 247–278,
2021.
[33] S. M. Lundberg et al., “From local explanations to
global understanding with explainable AI for trees,”
Nature Machine Intelligence, vol. 2, no. 1, pp. 56–67,
Jan. 2020, ISSN: 2522-5839. DOI: 10 .1038/s42256-
019-0138-9.
[34] M. Setzu, R. Guidotti, A. Monreale, F. Turini, D.
Pedreschi, and F. Giannotti, “GLocalX - From Local
to Global Explanations of Black Box AI Models,”
Artificial Intelligence, vol. 294, p. 103 457, May 2021,
ISSN: 0004-3702. DOI: 10.1016/j.artint.2021.103457.
[35] H. Liu, “Feature Selection,” in Encyclopedia of Ma-
chine Learning, Boston, MA: Springer US, 2010,
pp. 402–406, ISBN: 978-0-387-30164-8. DOI: 10.1007/
978-0-387-30164-8\ 306. (visited on 08/09/2023).
[36] D. Erhan, A. Courville, Y. Bengio, and P. O. Box,
“Understanding Representations Learned in Deep Ar-
chitectures,” 2010.
[37] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep
inside convolutional networks: Visualising image clas-
sification models and saliency maps,” in Proceedings
of the International Conference on Learning Repre-
sentations (ICLR), ICLR, 2014.
[38] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic
Attribution for Deep Networks,” in Proceedings of the
34th International Conference on Machine Learning,
PMLR, Jul. 2017, pp. 3319–3328.
[39] M. D. Zeiler, G. W. Taylor, and R. Fergus, “Adap-
tive deconvolutional networks for mid and high level
feature learning,” in 2011 International Conference on
Computer Vision, Nov. 2011, pp. 2018–2025. DOI: 10.
1109/ICCV.2011.6126474.
[40] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A.
Torralba, “Learning deep features for discriminative
localization,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2016,
pp. 2921–2929.
[41] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D.
Parikh, and D. Batra, “Grad-CAM: Visual explanations
from deep networks via gradient-based localization,”
in Proceedings of the IEEE International Conference
on Computer Vision (ICCV), Oct. 2017.
[42] S. Bach, A. Binder, G. Montavon, F. Klauschen,
K.-R. Müller, and W. Samek, “On pixel-wise expla-
nations for non-linear classifier decisions by layer-
wise relevance propagation,” PloS one, vol. 10, no. 7,
e0130140, 2015.
[43] M. D. Zeiler and R. Fergus, “Visualizing and under-
standing convolutional networks,” in Computer Vision–
ECCV 2014, Springer, 2014, pp. 818–833.
[44] J. H. Friedman, “Greedy function approximation:
A gradient boosting machine,” Annals of statistics,
pp. 1189–1232, 2001.
[45] D. W. Apley and J. Zhu, “Visualizing the effects of
predictor variables in black box supervised learning
models,” Journal of the Royal Statistical Society Series
B: Statistical Methodology, vol. 82, no. 4, pp. 1059–
1086, 2020.
[46] M. T. Ribeiro, S. Singh, and C. Guestrin, “”Why
Should I Trust You?”: Explaining the Predictions of
Any Classifier,” in Proceedings of the 22nd ACM
SIGKDD International Conference on Knowledge Dis-
https://doi.org/10.48550/arXiv.2107.03342
https://doi.org/10.48550/arXiv.2107.03342
https://arxiv.org/abs/2107.03342
http://arxiv.org/abs/2107.03342
https://doi.org/10.1109/TGRS.2018.2867002
https://arxiv.org/abs/2012.05150
https://doi.org/10.1038/s41467-019-10105-3
https://doi.org/10.1136/bmj.n160
https://doi.org/10.1136/bmj.n160
33781993
https://doi.org/10.1002/jrsm.1378
https://doi.org/10.1002/jrsm.1378
https://arxiv.org/abs/2006.00093
https://arxiv.org/abs/2006.00093
https://doi.org/10.1002/qj.3803
https://doi.org/10.1109/ACCESS.2018.2870052
https://doi.org/10.1109/ACCESS.2018.2870052
https://doi.org/10.1038/s42256-019-0138-9
https://doi.org/10.1038/s42256-019-0138-9
https://doi.org/10.1016/j.artint.2021.103457
https://doi.org/10.1007/978-0-387-30164-8\_306
https://doi.org/10.1007/978-0-387-30164-8\_306
https://doi.org/10.1109/ICCV.2011.6126474
https://doi.org/10.1109/ICCV.2011.6126474
covery and Data Mining, San Francisco California
USA: ACM, Aug. 2016, pp. 1135–1144, ISBN: 978-1-
4503-4232-2. DOI: 10.1145/2939672.2939778.
[47] S. M. Lundberg and S.-I. Lee, “A Unified Approach
to Interpreting Model Predictions,” vol. 30, 2017.
[48] W. J. Murdoch and A. Szlam, “Automatic rule extrac-
tion from long short term memory networks,” in In-
ternational Conference on Learning Representations,
2017. [Online]. Available: https : / / openreview . net /
forum?id=SJvYgH9xe.
[49] M. Harradon, J. Druce, and B. Ruttenberg, “Causal
learning and explanation of deep neural net-
works viaautoencoded activations,” arXiv preprint
arXiv:1802.00541, 2018.
[50] N. Frosst and G. Hinton, “Distilling a neural net-
work into a soft decision tree,” arXiv preprint
arXiv:1711.09784, 2017.
[51] S. Tan, G. Hooker, P. Koch, A. Gordo, and R.
Caruana, “Considerations when learning additive ex-
planations for black-box models,” arXiv preprint
arXiv:1801.08640, 2018.
[52] Q. Zhang, Y. Yang, H. Ma, and Y. N. Wu, “Inter-
preting cnns via decision trees,” in Proceedings of the
IEEE/CVF conference on computer vision and pattern
recognition, 2019, pp. 6261–6270.
[53] Q. Zhang, R. Cao, F. Shi, Y. N. Wu, and S.-C.
Zhu, “Interpreting cnn knowledge via an explanatory
graph,” in Proceedings of the AAAI conference on
artificial intelligence, vol. 32, 2018.
[54] Q. Zhang, R. Cao, Y. N. Wu, and S.-C. Zhu, “Growing
interpretable part graphs on convnets via multi-shot
learning,” in Proceedings of the AAAI Conference on
Artificial Intelligence, vol. 31, 2017.
[55] T. Hastie, R. Tibshirani, J. H. Friedman, and J. H.
Friedman, The elements of statistical learning: data
mining, inference, and prediction. Springer, 2009,
vol. 2.
[56] J. D. Annette, Introduction to generalized linear mod-
els. Chapman & Hall CRC, 2018.
[57] T. J. Hastie, “Generalized additive models,” in Statis-
tical models in S, Routledge, 1992, pp. 249–307.
[58] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirich-
let allocation,” Journal of machine Learning research,
vol. 3, no. Jan, pp. 993–1022, 2003.
[59] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine
translation by jointly learning to align and translate,”
arXiv preprint arXiv:1409.0473, 2014.
[60] A. Vaswani et al., “Attention is all you need,”
Advances in neural information processing systems,
vol. 30, 2017.
[61] B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler,
F. Viegas, et al., “Interpretability beyond feature at-
tribution: Quantitative testing with concept activation
vectors (tcav),” in International conference on machine
learning, PMLR, 2018, pp. 2668–2677.
[62] A. Ghorbani, J. Wexler, J. Y. Zou, and B. Kim,
“Towards automatic concept-based explanations,” Ad-
vances in Neural Information Processing Systems,
vol. 32, 2019.
[63] P. W. Koh et al., “Concept bottleneck models,” in
International conference on machine learning, PMLR,
2020, pp. 5338–5348.
[64] D. Marcos, S. Lobry, and D. Tuia, “Semantically
interpretable activation maps: What-where-how expla-
nations within cnns,” in 2019 IEEE/CVF International
Conference on Computer Vision Workshop (ICCVW),
IEEE, 2019, pp. 4207–4215.
[65] J. Bien and R. Tibshirani, “Prototype selection for
interpretable classification,” 2011.
[66] C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and
J. K. Su, “This looks like that: Deep learning for
interpretable image recognition,” Advances in neural
information processing systems, vol. 32, 2019.
[67] M. Nauta, R. Van Bree, and C. Seifert, “Neural proto-
type trees for interpretable fine-grained image recog-
nition,” in Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, 2021,
pp. 14 933–14 943.
[68] J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z.
Akata, “Textual explanations for self-driving vehicles,”
in Proceedings of the European conference on com-
puter vision (ECCV), 2018, pp. 563–578.
[69] S. Wachter, B. Mittelstadt, and C. Russell, “Counter-
factual Explanations without Opening the Black Box:
Automated Decisions and the GDPR,” Harvard Jour-
nal of Law & Technology (Harvard JOLT), vol. 31,
p. 841, 2017.
[70] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado,
and J. Dean, “Distributed Representations of Words
and Phrases and their Compositionality,” in Advances
in Neural Information Processing Systems, vol. 26,
Curran Associates, Inc., 2013. (visited on 11/30/2023).
[71] M. Kohlbrenner, A. Bauer, S. Nakajima, A. Binder, W.
Samek, and S. Lapuschkin, “Towards best practice in
explaining neural network decisions with lrp,” in 2020
International Joint Conference on Neural Networks
(IJCNN), IEEE, 2020, pp. 1–7.
[72] G. Montavon, A. Binder, S. Lapuschkin, W. Samek,
and K.-R. Müller, “Layer-wise relevance propagation:
An overview,” Explainable AI: interpreting, explaining
and visualizing deep learning, pp. 193–209, 2019.
[73] L. Arras et al., “Explaining and Interpreting LSTMs,”
Lecture Notes in Computer Science (including sub-
series Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), vol. 11700 LNCS,
pp. 211–238, 2019, ISSN: 16113349. DOI: 10.1007/
978-3-030-28954-6\ 11/FIGURES/7. arXiv: 1909.
12114. (visited on 06/27/2022).
[74] A. Fisher, C. Rudin, and F. Dominici, “All models
are wrong, but many are useful: Learning a variable’s
importance by studying an entire class of prediction
https://doi.org/10.1145/2939672.2939778
https://openreview.net/forum?id=SJvYgH9xe
https://openreview.net/forum?id=SJvYgH9xe
https://doi.org/10.1007/978-3-030-28954-6\_11/FIGURES/7
https://doi.org/10.1007/978-3-030-28954-6\_11/FIGURES/7
https://arxiv.org/abs/1909.12114
https://arxiv.org/abs/1909.12114
models simultaneously,” Journal of Machine Learning
Research, vol. 20, no. 177, pp. 1–81, 2019.
[75] L. S. Shapley, “A value for n-person games,” in Con-
tributions to the Theory of Games II, H. W. Kuhn and
A. W. Tucker, Eds., Princeton: Princeton University
Press, 1953, pp. 307–317.
[76] S. M. Lundberg, G. G. Erion, and S.-I. Lee, “Consis-
tent individualized feature attribution for tree ensem-
bles,” arXiv preprint arXiv:1802.03888, 2018.
[77] M. G. Augasta and T. Kathirvalavakumar, “Reverse
Engineering the Neural Networks for Rule Extraction
in Classification Problems,” Neural Processing Letters,
vol. 35, no. 2, pp. 131–150, Apr. 2012, ISSN: 1573-
773X. DOI: 10.1007/s11063-011-9207-8. (visited on
08/16/2023).
[78] Z.-H. Zhou and Y. Jiang, “Medical diagnosis with
C4.5 rule preceded by artificial neural network ensem-
ble,” IEEE Transactions on Information Technology in
Biomedicine, vol. 7, no. 1, pp. 37–42, Mar. 2003, ISSN:
1558-0032. DOI: 10.1109/TITB.2003.808498.
[79] X. Liu, X. Wang, and S. Matwin, “Improving the Inter-
pretability of Deep Neural Networks with Knowledge
Distillation,” in 2018 IEEE International Conference
on Data Mining Workshops (ICDMW), Nov. 2018,
pp. 905–912. DOI: 10.1109/ICDMW.2018.00132.
[80] D. Alvarez-Melis and T. S. Jaakkola, A causal frame-
work for explaining the predictions of black-box
sequence-to-sequence models, Nov. 2017. arXiv: 1707.
01943 [cs]. (visited on 08/16/2023).
[81] B. Bouchon-Meunier and C. Marsala, “Learning Fuzzy
Decision Rules,” in Fuzzy Sets in Approximate Rea-
soning and Information Systems, ser. The Handbooks
of Fuzzy Sets Series, Boston, MA: Springer US, 1999,
pp. 279–304, ISBN: 978-1-4615-5243-7. DOI: 10.1007/
978-1-4615-5243-7\ 5. (visited on 08/03/2023).
[82] L. A. Zadeh, “Fuzzy sets,” Information and control,
vol. 8, no. 3, pp. 338–353, 1965.
[83] L. Breiman, J. H. Friedman, R. A. Olshen, and
C. J. Stone, “Classification and regression trees,”
Wadsworth, Belmont, CA, 1984.
[84] Z. C. Lipton, “The mythos of model interpretability:
In machine learning, the concept of interpretability is
both important and slippery.,” Queue, vol. 16, no. 3,
pp. 31–57, 2018.
[85] L. Van der Maaten and G. Hinton, “Visualizing data
using t-sne.,” Journal of machine learning research,
vol. 9, no. 11, 2008.
[86] L. McInnes, J. Healy, N. Saul, and L. Großberger,
“Umap: Uniform manifold approximation and projec-
tion,” Journal of Open Source Software, vol. 3, no. 29,
2018.
[87] W. Brendel and M. Bethge, “Approximating CNNs
with Bag-of-local-Features models works surprisingly
well on ImageNet,” International Conference on
Learning Representations, 2019.
[88] T. Miller, “Contrastive explanation: A structural-
model approach,” The Knowledge Engineering Re-
view, vol. 36, e14, Jan. 2021, ISSN: 0269-8889, 1469-
8005. DOI: 10.1017/S0269888921000102. (visited on
10/17/2023).
[89] P.-T. Jiang, C.-B. Zhang, Q. Hou, M.-M. Cheng, and
Y. Wei, “LayerCAM: Exploring Hierarchical Class
Activation Maps forLocalization,” IEEE Transactions
on Image Processing, vol. 30, pp. 5875–5888, 2021,
ISSN: 1941-0042. DOI: 10.1109/TIP.2021.3089943.
[90] A. Shrikumar, P. Greenside, and A. Kundaje, “Learn-
ing important features through propagating activation
differences,” in International conference on machine
learning, PMLR, 2017, pp. 3145–3153.
[91] G. Brauwers and F. Frasincar, “A general survey on
attention mechanisms in deep learning,” IEEE Trans-
actions on Knowledge and Data Engineering, 2021.
[92] A. Dosovitskiy et al., “An image is worth 16x16
words: Transformers for image recognition at scale,”
in 9th International Conference on Learning Repre-
sentations, ICLR 2021, Virtual Event, Austria, May 3-
7, 2021, OpenReview.net, 2021. [Online]. Available:
https://openreview.net/forum?id=YicbFdNTTy.
[93] P. Veličković, G. Cucurull, A. Casanova, A. Romero,
P. Liò, and Y. Bengio, “Graph attention networks,”
6th International Conference on Learning Represen-
tations, 2017.
[94] K. Clark, U. Khandelwal, O. Levy, and C. D. Manning,
“What does BERT look at? an analysis of BERT’s
attention,” in Proceedings of the 2019 ACL Work-
shop BlackboxNLP: Analyzing and Interpreting Neural
Networks for NLP, Florence, Italy: Association for
Computational Linguistics, Aug. 2019, pp. 276–286.
DOI: 10 . 18653 / v1 / W19 - 4828. [Online]. Available:
https://aclanthology.org/W19-4828.
[95] M. Caron et al., “Emerging properties in self-
supervised vision transformers,” in Proceedings of
the IEEE/CVF international conference on computer
vision, 2021, pp. 9650–9660.
[96] S. Chaudhari, V. Mithal, G. Polatkan, and R. Ra-
manath, “An Attentive Survey of Attention Models,”
ACM Transactions on Intelligent Systems and Technol-
ogy (TIST), vol. 12, no. 5, pp. 1–32, 2021.
[97] F. Doshi-Velez and B. Kim, “Towards a rigorous sci-
ence of interpretable machine learning,” arXiv preprint
arXiv:1702.08608, 2017.
[98] M. Nauta et al., “From anecdotal evidence to quan-
titative evaluation methods: A systematic review on
evaluating explainable ai,” ACM Comput. Surv., Feb.
2023, Just Accepted, ISSN: 0360-0300. DOI: 10.1145/
3583558. [Online]. Available: https://doi.org/10.1145/
3583558.
[99] J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow,
M. Hardt, and B. Kim, “Sanity Checks for Saliency
Maps,” in Advances in Neural Information Processing
https://doi.org/10.1007/s11063-011-9207-8
https://doi.org/10.1109/TITB.2003.808498
https://doi.org/10.1109/ICDMW.2018.00132
https://arxiv.org/abs/1707.01943
https://arxiv.org/abs/1707.01943
https://doi.org/10.1007/978-1-4615-5243-7\_5
https://doi.org/10.1007/978-1-4615-5243-7\_5
https://doi.org/10.1017/S0269888921000102
https://doi.org/10.1109/TIP.2021.3089943
https://openreview.net/forum?id=YicbFdNTTy
https://doi.org/10.18653/v1/W19-4828
https://aclanthology.org/W19-4828
https://doi.org/10.1145/3583558
https://doi.org/10.1145/3583558
https://doi.org/10.1145/3583558
https://doi.org/10.1145/3583558
Systems, vol. 31, Curran Associates, Inc., 2018. (vis-
ited on 03/16/2023).
[100] U. Bhatt, A. Weller, and J. M. F. Moura, “Evaluating
and aggregating feature-based model explanations,” in
Proceedings of the Twenty-Ninth International Joint
Conference on Artificial Intelligence, IJCAI-20, C.
Bessiere, Ed., Main track, International Joint Con-
ferences on Artificial Intelligence Organization, Jul.
2020, pp. 3016–3022. DOI: 10.24963/ijcai.2020/417.
[Online]. Available: https : / /doi .org /10 .24963 / ijcai .
2020/417.
[101] A. Hedström et al., “Quantus: An Explainable AI
Toolkit for Responsible Evaluation of Neural Network
Explanations and Beyond,” Journal of Machine Learn-
ing Research, vol. 24, no. 34, pp. 1–11, 2023, ISSN:
1533-7928.
[102] C.-K. Yeh, C.-Y. Hsieh, A. Suggala, D. I. Inouye, and
P. K. Ravikumar, “On the (in) fidelity and sensitiv-
ity of explanations,” Advances in Neural Information
Processing Systems, vol. 32, 2019.
[103] S. Hooker, D. Erhan, P.-J. Kindermans, and B. Kim,
“A benchmark for interpretability methods in deep
neural networks,” Advances in neural information pro-
cessing systems, vol. 32, 2019.
[104] Y. Rong, T. Leemann, V. Borisov, G. Kasneci,
and E. Kasneci, “A consistent and efficient evalua-
tion strategy for attribution methods,” arXiv preprint
arXiv:2202.00449, 2022.
[105] H. Suresh, K. M. Lewis, J. Guttag, and A. Satya-
narayan, “Intuitively assessing ml model reliability
through example-based explanations and editing model
inputs,” in 27th International Conference on Intelligent
User Interfaces, 2022, pp. 767–781.
[106] A. Alqaraawi, M. Schuessler, P. Weiß, E. Costanza,
and N. Berthouze, “Evaluating saliency map explana-
tions for convolutional neural networks: A user study,”
in Proceedings of the 25th International Conference on
Intelligent User Interfaces, 2020, pp. 275–285.
[107] Y. Rong et al., “Towards Human-Centered Explainable
AI: A Survey of User Studies for Model Explanations,”
IEEE Transactions on Pattern Analysis and Machine
Intelligence, pp. 1–20, 2023, ISSN: 0162-8828, 2160-
9292, 1939-3539. DOI: 10 . 1109 / TPAMI . 2023 .
3331846. (visited on 12/29/2023).
[108] L. Weber, S. Lapuschkin, A. Binder, and W. Samek,
“Beyond explaining: Opportunities and challenges of
xai-based model improvement,” Information Fusion,
vol. 92, pp. 154–176, 2023.
[109] L. N. Sotomayor, M. J. Cracknell, and R. Musk,
“Supervised machine learning for predicting and in-
terpreting dynamic drivers of plantation forest pro-
ductivity in northern Tasmania, Australia,” Computers
and Electronics in Agriculture, vol. 209, 2023, ISSN:
01681699. DOI: 10.1016/j.compag.2023.107804.
[110] H. Chen, L. Yang, and Q. Wu, “Enhancing land
cover mapping and monitoring: An interactive and
explainable machine learning approach using google
earth engine,” Remote Sensing, vol. 15, no. 18, 2023,
ISSN: 20724292. DOI: 10.3390/rs15184585.
[111] T. Fisher et al., “Uncertainty-aware interpretable deep
learning for slum mapping and monitoring,” Remote
Sensing, vol. 14, no. 13, 2022. DOI: 10 . 3390 /
rs14133072.
[112] S. L. Jayasinghe and L. Kumar, “Causes of tea land
dynamics in Sri Lanka between 1995 and 2030,”
Regional Environmental Change, vol. 23, no. 4, p. 127,
Oct. 2023, ISSN: 1436-378X. DOI: 10.1007/s10113-
023-02123-1.
[113] Z. Li, Y. Wen, M. Schreier, A. Behrangi, Y. Hong,
and B. Lambrigtsen, “Advancing satellite precipitation
retrievals with data driven approaches: Is black box
model explainable?” Earth and Space Science, vol. 8,
no. 2, 2021. DOI: 10.1029/2020EA001423.
[114] S. Matin and B. Pradhan, “Earthquake-induced
building-damage mapping using explainable ai (Xai),”
Sensors, vol. 21, no. 13, 2021. DOI: 10 . 3390 /
s21134489.
[115] C. Milà, J. Ballester, X. Basagaña, M. J. Nieuwen-
huijsen, and C. Tonne, “Estimating daily air temper-
ature and pollution in Catalonia: A comprehensive
spatiotemporal modelling of multiple exposures,” En-
vironmental Pollution, vol. 337, 2023, ISSN: 02697491.
DOI: 10.1016/j.envpol.2023.122501.
[116] O. Mudele, F. M. Bayer, L. F. R. Zanandrez, A. E.
Eiras, and P. Gamba, “Modeling the temporal pop-
ulation distribution of Ae. aegypti mosquito using
big earth observation data,” IEEE access : practical
innovations, open solutions, vol. 8, pp. 14 182–14 194,
2020, ISSN: 2169-3536. DOI: 10.1109/ACCESS.2020.
2966080.
[117] S. Newman and R. Furbank, “Explainable machine
learning models of major crop traits from satellite-
monitored continent-wide field trial data,” Nature
Plants, vol. 7, no. 10, pp. 1354–1363, 2021. DOI: 10.
1038/s41477-021-01001-0.
[118] A. Orynbaikyzy, U. Gessner, B. Mack, and C. Conrad,
“Crop type classification using fusion of sentinel-1
and sentinel-2 data: Assessing the impact of feature
selection, optical data availability, and parcel sizes on
the accuracies,” Remote Sensing, vol. 12, no. 17, 2020.
DOI: 10.3390/RS12172779.
[119] J. N. S. Rubı́ and P. R. L. Gondim, “A performance
comparison of machine learning models for wildfire
occurrence risk prediction in the Brazilian Federal
District region,” Environment Systems and Decisions,
Aug.2023, ISSN: 2194-5411. DOI: 10.1007/s10669-
023-09921-2.
[120] Y. Wu, H. Liu, S. Liu, and C. Lou, “Estimate of near-
surface NO2 concentrations in Fenwei Plain, China,
based on TROPOMI data and random forest model,”
Environmental Monitoring and Assessment, vol. 195,
https://doi.org/10.24963/ijcai.2020/417
https://doi.org/10.24963/ijcai.2020/417
https://doi.org/10.24963/ijcai.2020/417
https://doi.org/10.1109/TPAMI.2023.3331846
https://doi.org/10.1109/TPAMI.2023.3331846
https://doi.org/10.1016/j.compag.2023.107804
https://doi.org/10.3390/rs15184585
https://doi.org/10.3390/rs14133072
https://doi.org/10.3390/rs14133072
https://doi.org/10.1007/s10113-023-02123-1
https://doi.org/10.1007/s10113-023-02123-1
https://doi.org/10.1029/2020EA001423
https://doi.org/10.3390/s21134489
https://doi.org/10.3390/s21134489
https://doi.org/10.1016/j.envpol.2023.122501
https://doi.org/10.1109/ACCESS.2020.2966080
https://doi.org/10.1109/ACCESS.2020.2966080
https://doi.org/10.1038/s41477-021-01001-0
https://doi.org/10.1038/s41477-021-01001-0
https://doi.org/10.3390/RS12172779
https://doi.org/10.1007/s10669-023-09921-2
https://doi.org/10.1007/s10669-023-09921-2
no. 11, p. 1379, Oct. 2023, ISSN: 1573-2959. DOI:
10.1007/s10661-023-11993-1.
[121] J. Xu et al., “Towards interpreting multi-temporal deep
learning models in crop mapping,” Remote Sensing of
Environment, vol. 264, 2021. DOI: 10.1016/j.rse.2021.
112599.
[122] X. Yan et al., “A Spatial-Temporal Interpretable Deep
Learning Model for improving interpretability and
predictive accuracy of satellite-based PM2.5,” Envi-
ronmental Pollution, vol. 273, 2021. DOI: 10 . 1016 /
j.envpol.2021.116459.
[123] B. Chen et al., “Estimation of atmospheric PM10 con-
centration in china using an interpretable deep learning
model and top-of-the-atmosphere reflectance data from
china’s new generation geostationary meteorological
satellite, FY-4A,” Journal of Geophysical Research:
Atmospheres, vol. 127, no. 9, 2022. DOI: 10 . 1029 /
2021JD036393.
[124] B. Chen et al., “Estimation of near-surface ozone
concentration and analysis of main weather situa-
tion in China based on machine learning model and
Himawari-8 TOAR data,” Science of the Total Envi-
ronment, vol. 864, 2023, ISSN: 00489697. DOI: 10 .
1016/j.scitotenv.2022.160928.
[125] B. Chen et al., “Exploring high-resolution near-
surface CO concentrations based on Himawari-8 top-
of-atmosphere radiation data: Assessing the distribu-
tion of city-level CO hotspots in China,” Atmospheric
Environment, vol. 312, 2023, ISSN: 13522310. DOI:
10.1016/j.atmosenv.2023.120021.
[126] O. Sen and H. Y. Keles, “On the evaluation of CNN
models in remote-sensing scene classification domain,”
PFG – Journal of Photogrammetry, Remote Sensing
and Geoinformation Science, vol. 88, no. 6, pp. 477–
492, Dec. 2020, ISSN: 2512-2819. DOI: 10 . 1007 /
s41064-020-00129-6.
[127] B. Vasu and A. Savakis, “Resilience and plasticity of
deep network interpretations for aerial imagery,” IEEE
access : practical innovations, open solutions, vol. 8,
pp. 127 491–127 506, 2020, ISSN: 2169-3536. DOI: 10.
1109/ACCESS.2020.3008323.
[128] A. Temenos, N. Temenos, M. Kaselimi, A. Doulamis,
and N. Doulamis, “Interpretable deep learning frame-
work for land use and land cover classification in re-
mote sensing using SHAP,” IEEE Geoscience and Re-
mote Sensing Letters, pp. 1–1, 2023, ISSN: 1545598X.
DOI: 10.1109/LGRS.2023.3251652.
[129] S.-C. Hung, H.-C. Wu, and M.-H. Tseng, “Remote
sensing scene classification and explanation using
RSSCNet and LIME,” Applied Sciences (Switzerland),
vol. 10, no. 18, 2020. DOI: 10.3390/app10186151.
[130] S. Elliott, A. Shields, E. Klaehn, and I. Tien, “Iden-
tifying critical infrastructure in imagery data using
explainable convolutional neural networks,” Remote
Sensing, vol. 14, no. 21, 2022. DOI: 10 . 3390 /
rs14215331.
[131] A. M. Burgueño et al., “Scalable approach for high-
resolution land cover: A case study in the Mediter-
ranean Basin,” Journal of Big Data, vol. 10, no. 1,
2023, ISSN: 21961115. DOI: 10 . 1186 / s40537 - 023 -
00770-z.
[132] A. Brenning, “Interpreting machine-learning models
in transformed feature space with an application
to remote-sensing classification,” Machine Learning,
vol. 112, no. 9, pp. 3455–3471, 2023, ISSN: 08856125.
DOI: 10.1007/s10994-023-06327-8.
[133] C. F. Dantas, T. F. Drumond, D. Marcos, and D.
Ienco, “Counterfactual Explanations for Remote Sens-
ing Time Series Data: An Application to Land Cover
Classification,” in Machine Learning and Knowledge
Discovery in Databases: Applied Data Science and
Demo Track, ser. Lecture Notes in Computer Science,
Cham: Springer Nature Switzerland, 2023, pp. 20–36,
ISBN: 978-3-031-43430-3. DOI: 10.1007/978-3-031-
43430-3 2.
[134] S.-n. Ishikawa et al., “Example-based explainable AI
and its application for remote sensing image classi-
fication,” International Journal of Applied Earth Ob-
servation and Geoinformation, vol. 118, 2023, ISSN:
15698432. DOI: 10.1016/j.jag.2023.103215.
[135] P. V. Arun and A. Karnieli, “Learning of physi-
cally significant features from earth observation data:
An illustration for crop classification and irrigation
scheme detection,” Neural Computing and Applica-
tions, vol. 34, no. 13, pp. 10 929–10 948, Jul. 2022,
ISSN: 1433-3058. DOI: 10.1007/s00521-022-07019-5.
[136] L. Han et al., “An explainable XGBoost model im-
proved by SMOTE-ENN technique for maize lodging
detection based on multi-source unmanned aerial vehi-
cle images,” Computers and Electronics in Agriculture,
vol. 194, 2022. DOI: 10.1016/j.compag.2022.106804.
[137] C. Broms, M. Nilsson, A. Oxenstierna, A. Sopasakis,
and K. Åström, “Combined analysis of satellite and
ground data for winter wheat yield forecasting,” Smart
Agricultural Technology, vol. 3, p. 100 107, Feb. 2023,
ISSN: 2772-3755. DOI: 10.1016/j.atech.2022.100107.
(visited on 08/27/2022).
[138] F. Huber, A. Yushchenko, B. Stratmann, and V. Stein-
hage, “Extreme Gradient Boosting for yield esti-
mation compared with Deep Learning approaches,”
Computers and Electronics in Agriculture, vol. 202,
p. 107 346, Nov. 2022, ISSN: 0168-1699. DOI: 10 .
1016/j.compag.2022.107346. (visited on 11/10/2023).
[139] E. J. Jones, T. F. Bishop, B. P. Malone, P. J. Hulme,
B. M. Whelan, and P. Filippi, “Identifying causes
of crop yield variability with interpretive machine
learning,” Computers and Electronics in Agriculture,
vol. 192, p. 106 632, Jan. 2022, ISSN: 01681699.
DOI: 10 . 1016 / j . compag . 2021 . 106632. (visited on
08/29/2022).
[140] H. Singh, A. Roy, R. Setia, and B. Pateriya, “Sim-
ulation of multispectral data using hyperspectral data
https://doi.org/10.1007/s10661-023-11993-1
https://doi.org/10.1016/j.rse.2021.112599
https://doi.org/10.1016/j.rse.2021.112599
https://doi.org/10.1016/j.envpol.2021.116459
https://doi.org/10.1016/j.envpol.2021.116459
https://doi.org/10.1029/2021JD036393
https://doi.org/10.1029/2021JD036393
https://doi.org/10.1016/j.scitotenv.2022.160928
https://doi.org/10.1016/j.scitotenv.2022.160928
https://doi.org/10.1016/j.atmosenv.2023.120021
https://doi.org/10.1007/s41064-020-00129-6
https://doi.org/10.1007/s41064-020-00129-6
https://doi.org/10.1109/ACCESS.2020.3008323
https://doi.org/10.1109/ACCESS.2020.3008323
https://doi.org/10.1109/LGRS.2023.3251652
https://doi.org/10.3390/app10186151
https://doi.org/10.3390/rs14215331
https://doi.org/10.3390/rs14215331
https://doi.org/10.1186/s40537-023-00770-z
https://doi.org/10.1186/s40537-023-00770-z
https://doi.org/10.1007/s10994-023-06327-8
https://doi.org/10.1007/978-3-031-43430-3_2
https://doi.org/10.1007/978-3-031-43430-3_2
https://doi.org/10.1016/j.jag.2023.103215
https://doi.org/10.1007/s00521-022-07019-5
https://doi.org/10.1016/j.compag.2022.106804
https://doi.org/10.1016/j.atech.2022.100107
https://doi.org/10.1016/j.compag.2022.107346
https://doi.org/10.1016/j.compag.2022.107346
https://doi.org/10.1016/j.compag.2021.106632
for crop stress studies,” in Lecture Notes in Electrical
Engineering, vol. 970, Springer Science and Business
Media DeutschlandGmbH, 2023, pp. 43–52, ISBN:
978-981-19769-7-1. DOI: 10.1007/978-981-19-7698-
8 5.
[141] P. Filippi, B. M. Whelan, R. W. Vervoort, and T. F. A.
Bishop, “Identifying crop yield gaps with site- and
season-specific data-driven models of yield potential,”
Precision Agriculture, vol. 23, no. 2, pp. 578–601, Apr.
2022, ISSN: 1573-1618. DOI: 10.1007/s11119- 021-
09850-7.
[142] H. S. Nayak et al., “Interpretable machine learning
methods to explain on-farm yield variability of high
productivity wheat in Northwest India,” Field Crops
Research, vol. 287, p. 108 640, Oct. 2022, ISSN: 0378-
4290. DOI: 10 .1016 / j . fcr. 2022 .108640. (visited on
08/27/2022).
[143] M. Campos-Taberner et al., “Understanding deep
learning in land use classification based on Sentinel-
2 time series,” Scientific Reports, vol. 10, no. 1,
p. 17 188, Oct. 2020, ISSN: 2045-2322. DOI: 10.1038/
s41598-020-74215-5.
[144] A. Mateo-Sanchis, J. E. Adsuara, M. Piles, J. Muñoz-
Marı́, A. Pérez-Suay, and G. Camps-Valls, “Inter-
pretable Long-Short Term Memory Networks for Crop
Yield Estimation,” IEEE Geoscience and Remote Sens-
ing Letters, pp. 1–1, 2023, ISSN: 1558-0571. DOI: 10.
1109/LGRS.2023.3244064.
[145] D. Paudel, A. de Wit, H. Boogaard, D. Marcos,
S. Osinga, and I. N. Athanasiadis, “Interpretability
of deep learning models for crop yield forecasting,”
Computers and Electronics in Agriculture, vol. 206,
p. 107 663, Mar. 2023, ISSN: 0168-1699. DOI: 10 .
1016/j.compag.2023.107663. (visited on 02/06/2023).
[146] I. Obadic, R. Roscher, D. A. B. Oliveira, and X. X.
Zhu, Exploring Self-Attention for Crop-type Classifica-
tion Explainability, Oct. 2022. DOI: 10.48550/arXiv.
2210 . 13167. arXiv: 2210 . 13167 [cs]. (visited on
12/11/2022).
[147] M. Rußwurm and M. Körner, “Self-attention for
raw optical Satellite Time Series Classification,” IS-
PRS Journal of Photogrammetry and Remote Sensing,
vol. 169, pp. 421–435, Nov. 2020, ISSN: 09242716.
DOI: 10 . 1016 / j . isprsjprs . 2020 . 06 . 006. arXiv:
1910 . 10536 [cs, eess, stat]. (visited on
08/27/2022).
[148] V. Sainte Fare Garnot, L. Landrieu, S. Giordano, and
N. Chehata, “Satellite Image Time Series Classifi-
cation With Pixel-Set Encoders and Temporal Self-
Attention,” in 2020 IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition (CVPR), Jun.
2020, pp. 12 322–12 331. DOI: 10.1109/CVPR42600.
2020.01234.
[149] A. Wolanin et al., “Estimating and understanding crop
yields with explainable deep learning in the Indian
Wheat Belt,” Environmental Research Letters, vol. 15,
no. 2, 2020. DOI: 10.1088/1748-9326/ab68ac.
[150] L. E. C. L. Rosa, D. A. B. Oliveira, and P. Ghamisi,
“Learning crop type mapping from regional label
proportions in large-scale SAR and optical imagery,”
IEEE Transactions on Geoscience and Remote Sens-
ing, pp. 1–1, 2023, ISSN: 1558-0644. DOI: 10.1109/
TGRS.2023.3321156. (visited on 10/11/2023).
[151] A. Mateo-Sanchis et al., “Learning main drivers of
crop progress and failure in Europe with interpretable
machine learning,” International Journal of Applied
Earth Observation and Geoinformation, vol. 104,
2021. DOI: 10.1016/j.jag.2021.102574.
[152] L. Martı́nez-Ferrer, M. Piles, and G. Camps-Valls,
“Crop Yield Estimation and Interpretability With
Gaussian Processes,” IEEE Geoscience and Remote
Sensing Letters, vol. 18, no. 12, pp. 2043–2047, Dec.
2021, ISSN: 1558-0571. DOI: 10 .1109/LGRS.2020.
3016140.
[153] M. F. Celik, M. S. Isik, G. Taskin, E. Erten, and
G. Camps-Valls, “Explainable artificial intelligence
for cotton yield prediction with multisource data,”
IEEE Geoscience and Remote Sensing Letters, vol. 20,
pp. 1–5, 2023, ISSN: 1558-0571. DOI: 10.1109/LGRS.
2023.3303643.
[154] J. Vega, F. H. Sepúlveda-Murillo, and M. Parra, “Land-
slide Modeling in a Tropical Mountain Basin Using
Machine Learning Algorithms and Shapley Additive
Explanations,” Air, Soil and Water Research, vol. 16,
p. 11 786 221 231 195 824, Jan. 2023, ISSN: 1178-
6221, 1178-6221. DOI: 10.1177/11786221231195824.
(visited on 09/04/2023).
[155] H. Al-Najjar, B. Pradhan, G. Beydoun, R. Sarkar,
H.-J. Park, and A. Alamri, “A novel method using
explainable artificial intelligence (XAI)-based Shapley
Additive Explanations for spatial landslide prediction
using Time-Series SAR dataset,” Gondwana Research,
2022. DOI: 10.1016/j.gr.2022.08.004.
[156] S. Alqadhi, J. Mallick, and M. Alkahtani, “Integrated
deep learning with explainable artificial intelligence
for enhanced landslide management,” Natural Haz-
ards, 2023, ISSN: 0921030X. DOI: 10.1007/s11069-
023-06260-y.
[157] A. Dahal and L. Lombardo, Explainable artificial
intelligence in geoscience: A glimpse into the future
of landslide susceptibility modeling, Aug. 2022. DOI:
10.1002/essoar.10512130.1. (visited on 10/04/2022).
[158] M. S. K. Inan and I. Rahman, “Explainable AI in-
tegrated feature selection for landslide susceptibility
mapping using TreeSHAP,” SN Computer Science,
vol. 4, no. 5, p. 482, Jun. 2023, ISSN: 2661-8907. DOI:
10.1007/s42979-023-01960-5.
[159] J. Zhang et al., “Insights into geospatial heterogene-
ity of landslide susceptibility based on the SHAP-
XGBoost model,” Journal of environmental manage-
https://doi.org/10.1007/978-981-19-7698-8_5
https://doi.org/10.1007/978-981-19-7698-8_5
https://doi.org/10.1007/s11119-021-09850-7
https://doi.org/10.1007/s11119-021-09850-7
https://doi.org/10.1016/j.fcr.2022.108640
https://doi.org/10.1038/s41598-020-74215-5
https://doi.org/10.1038/s41598-020-74215-5
https://doi.org/10.1109/LGRS.2023.3244064
https://doi.org/10.1109/LGRS.2023.3244064
https://doi.org/10.1016/j.compag.2023.107663
https://doi.org/10.1016/j.compag.2023.107663
https://doi.org/10.48550/arXiv.2210.13167
https://doi.org/10.48550/arXiv.2210.13167
https://arxiv.org/abs/2210.13167
https://doi.org/10.1016/j.isprsjprs.2020.06.006
https://arxiv.org/abs/1910.10536
https://doi.org/10.1109/CVPR42600.2020.01234
https://doi.org/10.1109/CVPR42600.2020.01234
https://doi.org/10.1088/1748-9326/ab68ac
https://doi.org/10.1109/TGRS.2023.3321156
https://doi.org/10.1109/TGRS.2023.3321156
https://doi.org/10.1016/j.jag.2021.102574
https://doi.org/10.1109/LGRS.2020.3016140
https://doi.org/10.1109/LGRS.2020.3016140
https://doi.org/10.1109/LGRS.2023.3303643
https://doi.org/10.1109/LGRS.2023.3303643
https://doi.org/10.1177/11786221231195824
https://doi.org/10.1016/j.gr.2022.08.004
https://doi.org/10.1007/s11069-023-06260-y
https://doi.org/10.1007/s11069-023-06260-y
https://doi.org/10.1002/essoar.10512130.1
https://doi.org/10.1007/s42979-023-01960-5
ment, vol. 332, p. 117 357, 2023, ISSN: 10958630. DOI:
10.1016/j.jenvman.2023.117357.
[160] D. Sun et al., “Assessment of landslide susceptibility
along mountain highways based on different machine
learning algorithms and mapping units by hybrid fac-
tors screening and sample optimization,” Gondwana
Research, 2023. DOI: 10.1016/j.gr.2022.07.013.
[161] A. Maxwell, M. Sharma, and K. Donaldson, “Ex-
plainable boosting machines for slope failure spatial
predictive modeling,” Remote Sensing, vol. 13, no. 24,
2021. DOI: 10.3390/rs13244991.
[162] H. Fang et al., “A new approach to spatial landslide
susceptibility prediction in karst mining areas based
on explainable artificial intelligence,” Sustainability
(Switzerland), vol. 15, no. 4, 2023, ISSN: 20711050.
DOI: 10.3390/su15043094.
[163] K. Youssef, K. Shao, S. Moon, and L.-S. Bouchard,
Landslide Susceptibility Modeling by Interpretable
Neural Network, Aug. 2022. DOI: 10 . 48550 / arXiv.
2201.06837. arXiv: 2201.06837 [physics]. (visited
on 08/27/2022).
[164] C. Chen and L. Fan, “An attribution deep learning
interpretation model for landslide susceptibility map-
ping in the Three Gorges Reservoir Area,” IEEE Trans-
actions on Geoscience and Remote Sensing, pp. 1–1,
2023, ISSN: 1558-0644. DOI: 10 .1109/TGRS.2023.
3323668. (visited on 10/17/2023).
[165] H. E. Aydin and M. C. Iban, “Predicting and analyz-
ing flood susceptibility using boosting-based ensemble
machine learning algorithms with SHapley Additive
exPlanations,” Natural Hazards, Dec.2022, ISSN:
1573-0840. DOI: 10.1007/s11069-022-05793-y.
[166] J. Liu, K. Liu, and M. Wang, “A residual neural net-
work integrated with a hydrological model for global
flood susceptibility mapping based on remote sensing
datasets,” Remote Sensing, vol. 15, no. 9, 2023, ISSN:
20724292. DOI: 10.3390/rs15092447.
[167] M. Wang et al., “An XGBoost-SHAP approach to
quantifying morphological impact on urban flood-
ing susceptibility,” Ecological Indicators, vol. 156,
p. 111 137, Dec. 2023, ISSN: 1470-160X. DOI: 10 .
1016/j.ecolind.2023.111137. (visited on 11/03/2023).
[168] Z. Zhang, P. Angelov, E. Soares, N. Longepe, and
P. P. Mathieu, “An interpretable deep semantic seg-
mentation method for earth observation,” in 2022 IEEE
11th International Conference on Intelligent Systems
(IS), Oct. 2022, pp. 1–8. DOI: 10.1109/IS57118.2022.
10019621.
[169] A. Abdollahi and B. Pradhan, “Explainable artificial
intelligence (XAI) for interpreting the contributing
factors feed into the wildfire susceptibility prediction
model,” Science of the Total Environment, vol. 879,
2023, ISSN: 00489697. DOI: 10.1016/j.scitotenv.2023.
163004.
[170] R. Cilli et al., “Explainable artificial intelligence (XAI)
detects wildfire occurrence in the Mediterranean coun-
tries of Southern Europe,” Scientific Reports, vol. 12,
no. 1, p. 16 349, Sep. 2022, ISSN: 2045-2322. DOI: 10.
1038/s41598-022-20347-9. (visited on 03/05/2023).
[171] A. H. Taylor, C. Airey-Lauvaux, B. Estes, L. Harris,
and C. N. Skinner, “Spatial patterns of nineteenth
century fire severity persist after fire exclusion and a
twenty-first century wildfire in a mixed conifer for-
est landscape, Southern Cascades, USA,” Landscape
Ecology, vol. 35, no. 12, pp. 2777–2790, Dec. 2020,
ISSN: 1572-9761. DOI: 10.1007/s10980-020-01118-1.
[172] N. Saini, C. Chattopadhyay, and D. Das, “E2AlertNet:
An explainable, efficient, and lightweight model for
emergency alert from aerial imagery,” Remote Sensing
Applications: Society and Environment, vol. 29, 2023,
ISSN: 23529385. DOI: 10.1016/j.rsase.2022.100896.
[173] T. Beker, Q. Song, and X. X. Zhu, “An analysis
of the gap between hybrid and real data for vol-
canic deformation detection,” in IGARSS 2023 - 2023
IEEE International Geoscience and Remote Sensing
Symposium, Jul. 2023, pp. 825–828. DOI: 10 . 1109 /
IGARSS52108.2023.10281964.
[174] Z. Chen et al., “Tunnel geothermal disaster susceptibil-
ity evaluation based on interpretable ensemble learn-
ing: A case study in Ya’an–Changdu section of the
Sichuan–Tibet traffic corridor,” Engineering Geology,
vol. 313, 2023, ISSN: 00137952. DOI: 10 . 1016 / j .
enggeo.2023.106985.
[175] R. Jena et al., “Explainable artificial intelligence (XAI)
model for earthquake spatial probability assessment in
arabian peninsula,” Remote Sensing, vol. 15, no. 9,
2023, ISSN: 20724292. DOI: 10.3390/rs15092248.
[176] A. Levering, D. Marcos, and D. Tuia, “On the relation
between landscape beauty and land cover: A case study
in the U.K. at Sentinel-2 resolution with interpretable
AI,” ISPRS Journal of Photogrammetry and Remote
Sensing, vol. 177, pp. 194–203, 2021. DOI: 10.1016/j.
isprsjprs.2021.04.020.
[177] A. Levering, D. Marcos, J. van Vliet, and D. Tuia,
“Predicting the liveability of Dutch cities with aerial
images and semantic intermediate concepts,” Remote
Sensing of Environment, vol. 287, p. 113 454, Mar.
2023, ISSN: 0034-4257. DOI: 10 . 1016 / j . rse . 2023 .
113454. (visited on 03/22/2023).
[178] S. Zhu et al., “Investigating impacts of ambient air
pollution on the terrestrial gross primary productivity
(GPP) from remote sensing,” IEEE Geoscience and
Remote Sensing Letters, vol. 19, pp. 1–5, 2022, ISSN:
1558-0571. DOI: 10.1109/LGRS.2022.3163775.
[179] C. B. Lambert, L. M. Resler, Y. Shao, and D. R. Butler,
“Vegetation change as related to terrain factors at two
glacier forefronts, Glacier National Park, Montana,
U.S.A.,” Journal of Mountain Science, vol. 17, no. 1,
pp. 1–15, Jan. 2020, ISSN: 1993-0321. DOI: 10.1007/
s11629-019-5603-8.
[180] H. Zhou et al., “Relative importance of climatic
variables, soil properties and plant traits to spatial
https://doi.org/10.1016/j.jenvman.2023.117357
https://doi.org/10.1016/j.gr.2022.07.013
https://doi.org/10.3390/rs13244991
https://doi.org/10.3390/su15043094
https://doi.org/10.48550/arXiv.2201.06837
https://doi.org/10.48550/arXiv.2201.06837
https://arxiv.org/abs/2201.06837
https://doi.org/10.1109/TGRS.2023.3323668
https://doi.org/10.1109/TGRS.2023.3323668
https://doi.org/10.1007/s11069-022-05793-y
https://doi.org/10.3390/rs15092447
https://doi.org/10.1016/j.ecolind.2023.111137
https://doi.org/10.1016/j.ecolind.2023.111137
https://doi.org/10.1109/IS57118.2022.10019621
https://doi.org/10.1109/IS57118.2022.10019621
https://doi.org/10.1016/j.scitotenv.2023.163004
https://doi.org/10.1016/j.scitotenv.2023.163004
https://doi.org/10.1038/s41598-022-20347-9
https://doi.org/10.1038/s41598-022-20347-9
https://doi.org/10.1007/s10980-020-01118-1
https://doi.org/10.1016/j.rsase.2022.100896
https://doi.org/10.1109/IGARSS52108.2023.10281964
https://doi.org/10.1109/IGARSS52108.2023.10281964
https://doi.org/10.1016/j.enggeo.2023.106985
https://doi.org/10.1016/j.enggeo.2023.106985
https://doi.org/10.3390/rs15092248
https://doi.org/10.1016/j.isprsjprs.2021.04.020
https://doi.org/10.1016/j.isprsjprs.2021.04.020
https://doi.org/10.1016/j.rse.2023.113454
https://doi.org/10.1016/j.rse.2023.113454
https://doi.org/10.1109/LGRS.2022.3163775
https://doi.org/10.1007/s11629-019-5603-8
https://doi.org/10.1007/s11629-019-5603-8
variability in net CO2 exchange across global forests
and grasslands,” Agricultural and Forest Meteorology,
vol. 307, p. 108 506, Sep. 2021, ISSN: 01681923.
DOI: 10 .1016/ j .agrformet .2021.108506. (visited on
01/05/2023).
[181] S. Cheng et al., “Improved Understanding of How
Catchment Properties Control Hydrological Partition-
ing Through Machine Learning,” Water Resources Re-
search, vol. 58, no. 4, e2021WR031412, 2022, ISSN:
1944-7973. DOI: 10.1029/2021WR031412. (visited on
01/05/2023).
[182] C. Karmakar, C. Dumitru, G. Schwarz, and M. Datcu,
“Feature-free explainable data mining in SAR images
using latent dirichlet allocation,” IEEE Journal of
Selected Topics in Applied Earth Observations and
Remote Sensing, vol. 14, pp. 676–689, 2021. DOI: 10.
1109/JSTARS.2020.3039012.
[183] C. Karmakar and M. Datcu, “A framework for inter-
active visual interpretation of remote sensing data,”
IEEE Geoscience and Remote Sensing Letters, vol. 19,
pp. 1–5, 2022, ISSN: 1558-0571. DOI: 10.1109/LGRS.
2022.3161959.
[184] D. Stroppiana, G. Bordogna, M. Sali, M. Boschetti, G.
Sona, and P. Brivio, “A fully automatic, interpretable
and adaptive machine learning approach to map burned
area from remote sensing,” ISPRS International Jour-
nal of Geo-Information, vol. 10, no. 8, 2021. DOI:
10.3390/ijgi10080546.
[185] H. Leon-Garza, H. Hagras, A. Peña-Rios, A. Conway,
and G. Owusu, “A big bang-big crunch type-2 fuzzy
logic system for explainable semantic segmentation of
trees in satellite images using HSV color space,” in
2020 IEEE International Conference on Fuzzy Systems
(FUZZ-IEEE), Jul. 2020, pp. 1–7. DOI: 10 . 1109 /
FUZZ48607.2020.9177611.
[186] B. Murray, M. A. Islam, A. J. Pinar, T. C. Havens,
D. T. Anderson, and G. Scott, “Explainable AI for
understanding decisions and data-driven optimization
of the choquet integral,” in 2018 IEEE International
Conference on Fuzzy Systems (FUZZ-IEEE), Jul. 2018,
pp. 1–8. DOI: 10.1109/FUZZ-IEEE.2018.8491501.
[187] M. A. Islam, D. T. Anderson, A. J. Pinar, T. C. Havens,
G. Scott, and J. M. Keller, “Enabling Explainable
Fusion in Deep Learning With Fuzzy Integral Neu-
ral Networks,” IEEE Transactions on Fuzzy Systems,
vol. 28, no. 7, pp. 1291–1300, Jul. 2020, ISSN: 1941-
0034. DOI: 10.1109/TFUZZ.2019.2917124.
[188] B. Murray, D. Anderson, T. Havens, T. Wilkin, and
A. Wilbik, “Information fusion-2-text: Explainable ag-
gregation via linguistic protoforms,” Communications
in Computer and Information Science, vol. 1239 CCIS,
pp. 114–127,2020. DOI: 10.1007/978-3-030-50153-
2 9.
[189] D. T. Anderson, G. J. Scott, M. A. Islam, B. Murray,
and R. Marcum, “Fuzzy choquet integration of deep
convolutional neural networks for remote sensing,” in
Computational Intelligence for Pattern Recognition,
Cham: Springer International Publishing, 2018, pp. 1–
28, ISBN: 978-3-319-89629-8. DOI: 10.1007/978- 3-
319-89629-8 1.
[190] Y. Li, Y. Zhang, X. Huang, and A. L. Yuille, “Deep
networks under scene-level supervision for multi-class
geospatial object detection from remote sensing im-
ages,” ISPRS Journal of Photogrammetry and Remote
Sensing, vol. 146, pp. 182–196, Dec. 2018, ISSN:
0924-2716. DOI: 10 . 1016 / j . isprsjprs . 2018 . 09 . 014.
(visited on 03/29/2023).
[191] W. Xiong, Z. Xiong, Y. Cui, L. Huang, and R. Yang,
“An Interpretable Fusion Siamese Network for Multi-
modality Remote Sensing Ship Image Retrieval,” IEEE
Transactions on Circuits and Systems for Video Tech-
nology, pp. 1–1, 2022, ISSN: 1558-2205. DOI: 10.1109/
TCSVT.2022.3224068.
[192] X. Guo et al., “Visual explanations with detailed
spatial information for remote sensing image classi-
fication via channel saliency,” International Journal
of Applied Earth Observation and Geoinformation,
vol. 118, p. 103 244, Apr. 2023, ISSN: 1569-8432. DOI:
10.1016/j.jag.2023.103244. (visited on 03/07/2023).
[193] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and
L. Fei-Fei, “Imagenet: A large-scale hierarchical image
database,” in 2009 IEEE conference on computer vi-
sion and pattern recognition, Ieee, 2009, pp. 248–255.
[194] Z. Feng, M. Zhu, L. Stanković, and H. Ji, “Self-
Matching CAM: A Novel Accurate Visual Explana-
tion of CNNs for SAR Image Interpretation,” Remote
Sensing, vol. 13, no. 9, p. 1772, Jan. 2021, ISSN:
2072-4292. DOI: 10 . 3390 / rs13091772. (visited on
08/27/2022).
[195] H. Wang et al., “Score-CAM: Score-Weighted Visual
Explanations for Convolutional Neural Networks,” in
Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition Workshops,
2020, pp. 24–25. (visited on 06/20/2023).
[196] S. M. Marvasti-Zadeh, D. Goodsman, N. Ray, and
N. Erbilgin, “Crown-CAM: Interpretable visual ex-
planations for tree crown detection in aerial images,”
IEEE Geoscience and Remote Sensing Letters, vol. 20,
pp. 1–5, 2023, ISSN: 1558-0571. DOI: 10.1109/LGRS.
2023.3271649.
[197] G. De Lucia, M. Lapegna, and D. Romano, “Towards
explainable AI for hyperspectral image classification
in Edge Computing environments,” Computers and
Electrical Engineering, vol. 103, 2022. DOI: 10.1016/
j.compeleceng.2022.108381.
[198] W. Song, S. Dai, D. Huang, J. Song, and L. Antonio,
“Median-pooling grad-CAM: An efficient inference
level visual explanation for CNN networks in remote
sensing image classification,” in MultiMedia Model-
ing, Cham: Springer International Publishing, 2021,
pp. 134–146, ISBN: 978-3-030-67835-7.
https://doi.org/10.1016/j.agrformet.2021.108506
https://doi.org/10.1029/2021WR031412
https://doi.org/10.1109/JSTARS.2020.3039012
https://doi.org/10.1109/JSTARS.2020.3039012
https://doi.org/10.1109/LGRS.2022.3161959
https://doi.org/10.1109/LGRS.2022.3161959
https://doi.org/10.3390/ijgi10080546
https://doi.org/10.1109/FUZZ48607.2020.9177611
https://doi.org/10.1109/FUZZ48607.2020.9177611
https://doi.org/10.1109/FUZZ-IEEE.2018.8491501
https://doi.org/10.1109/TFUZZ.2019.2917124
https://doi.org/10.1007/978-3-030-50153-2_9
https://doi.org/10.1007/978-3-030-50153-2_9
https://doi.org/10.1007/978-3-319-89629-8_1
https://doi.org/10.1007/978-3-319-89629-8_1
https://doi.org/10.1016/j.isprsjprs.2018.09.014
https://doi.org/10.1109/TCSVT.2022.3224068
https://doi.org/10.1109/TCSVT.2022.3224068
https://doi.org/10.1016/j.jag.2023.103244
https://doi.org/10.3390/rs13091772
https://doi.org/10.1109/LGRS.2023.3271649
https://doi.org/10.1109/LGRS.2023.3271649
https://doi.org/10.1016/j.compeleceng.2022.108381
https://doi.org/10.1016/j.compeleceng.2022.108381
[199] J. Ge, H. Zhang, L. Xu, C.-L. Sun, and C. Wang,
“Interpretable deep learning method combining tem-
poral backscattering coefficients and interferometric
coherence for rice area mapping,” IEEE Geoscience
and Remote Sensing Letters, vol. 20, pp. 1–5, 2023,
ISSN: 1558-0571. DOI: 10.1109/LGRS.2023.3321770.
[200] E. A. Barnes, R. J. Barnes, Z. K. Martin, and J. K.
Rader, “This Looks Like That There: Interpretable
Neural Networks for Image Tasks When Location
Matters,” Artificial Intelligence for the Earth Systems,
vol. 1, no. 3, e220001, Jul. 2022, ISSN: 2769-7525.
DOI: 10 . 1175 / AIES - D - 22 - 0001 . 1. (visited on
01/30/2023).
[201] X. Huang, Y. Sun, S. Feng, Y. Ye, and X. Li, “Better
visual interpretation for remote sensing scene classifi-
cation,” IEEE Geoscience and Remote Sensing Letters,
vol. 19, 2022. DOI: 10.1109/LGRS.2021.3132920.
[202] A. Brenning, “Spatial machine-learning model diag-
nostics: A model-agnostic distance-based approach,”
International Journal of Geographical Information
Science, vol. 37, no. 3, pp. 584–606, 2023, ISSN:
13658816. DOI: 10.1080/13658816.2022.2131789.
[203] G. Andresini, A. Appice, and D. Malerba, “SILVIA:
An eXplainable framework to map bark beetle infes-
tation in sentinel-2 images,” IEEE Journal of Selected
Topics in Applied Earth Observations and Remote
Sensing, pp. 1–17, 2023, ISSN: 2151-1535. DOI: 10.
1109/JSTARS.2023.3312521.
[204] J. Abitbol and M. Karsai, “Interpretable socioeco-
nomic status inference from aerial imagery through
urban patterns,” Nature Machine Intelligence, vol. 2,
no. 11, pp. 684–692, 2020. DOI: 10.1038/s42256-020-
00243-5.
[205] X. Li, D. Chen, W. Xu, H. Chen, J. Li, and F. Mo,
“Explainable dimensionality reduction (XDR) to un-
box AI ‘black box’ models: A study of AI perspectives
on the ethnic styles of village dwellings,” Humanities
and Social Sciences Communications, vol. 10, no. 1,
pp. 1–13, Jan. 2023, ISSN: 2662-9992. DOI: 10.1057/
s41599-023-01505-4. (visited on 02/07/2023).
[206] K. Blix, G. Camps-Valls, and R. Jenssen, “Gaussian
Process Sensitivity Analysis for Oceanic Chlorophyll
Estimation,” IEEE Journal of Selected Topics in Ap-
plied Earth Observations and Remote Sensing, vol. 10,
no. 4, pp. 1265–1277, Apr. 2017, ISSN: 2151-1535.
DOI: 10.1109/JSTARS.2016.2641583.
[207] W. Zhang et al., “Feature importance measure of
a multilayer perceptron based on the presingle-
connection layer,” Knowledge and Information Sys-
tems, Sep. 2023, ISSN: 0219-3116. DOI: 10 . 1007 /
s10115-023-01959-7.
[208] I. Kakogeorgiou and K. Karantzalos, “Evaluating ex-
plainable artificial intelligence methods for multi-label
deep learning classification tasks in remote sensing,”
International Journal of Applied Earth Observation
and Geoinformation, vol. 103, 2021. DOI: 10 .1016/
j.jag.2021.102520.
[209] T. Kim et al., “Federated onboard-ground station
computing with weakly supervised cascading pyramid
attention network for satellite image analysis,” IEEE
access : practical innovations, open solutions, vol. 10,
pp. 117 315–117 333, 2022, ISSN: 2169-3536. DOI: 10.
1109/ACCESS.2022.3219879.
[210] J.-P. Kucklick and O. Müller, “Tackling the accuracy-
interpretability trade-off: Interpretable deep learning
models for satellite image-based real estate appraisal,”
ACM Transactions on Management Information Sys-
tems, vol. 14, no. 1, 2023, ISSN: 2158656X. DOI: 10.
1145/3567430.
[211] Q. Su, X. Zhang, P. Xiao, Z. Li, and W. Wang, “Which
CAM is Better for Extracting Geographic Objects? A
Perspective From Principles and Experiments,” IEEE
Journal of Selected Topics in Applied Earth Obser-
vations and Remote Sensing, vol. 15, pp. 5623–5635,
2022, ISSN: 2151-1535. DOI: 10.1109/JSTARS.2022.
3188493.
[212] A. Abdollahi and B. Pradhan, “Urban vegetation
mapping from aerial imagery using explainable AI
(XAI),” Sensors, vol. 21, no. 14, 2021. DOI: 10.3390/
s21144738.
[213] S. Kondylatos et al., “Wildfire Danger Prediction
and Understanding with Deep Learning,” Geophysical
Research Letters, vol. n/a, no. n/a, e2022GL099368,
2022, ISSN: 1944-8007. DOI: 10.1029/2022GL099368.
(visitedsearch filters in three databases: Scopus1, Springer2, and
IEEE3. Appendix A lists the exact search queries for each
database.
The field of xAI started to gain momentum from 2014
[28]. However, we found that search results of xAI papers
in RS before 2017 are rarely relevant, supporting that the
methods have only been used in the last few years in this area.
Therefore, we searched journal and conference papers between
01.01.2017 and 31.10.2023, which is the last time the search
1https://www.scopus.com/
2https://link.springer.com/
3https://ieeexplore.ieee.org/
https://www.scopus.com/
https://link.springer.com/
https://ieeexplore.ieee.org/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Records identified from: 
Databases (n = 1075) 
 IEEE (n = 120) 
 Scopus (n = 339) 
 Springer (n = 616) 
Records removed before 
screening: 
Duplicate records 
removed (n = 111) 
Records screened 
(n = 964) 
Records excluded 
(n = 607) 
Reports sought for retrieval 
(n = 357) 
Reports not retrieved 
(n = 3) 
Reports assessed for 
eligibility (n = 354) 
Reports excluded 
(n = 208): 
EC1 & EC2 (n = 149) 
EC3 (n = 51) 
Other (n = 7) 
 
Records identified from: 
Private libraries (n = 60) 
Reports assessed for 
eligibility (n = 60) 
Reports excluded (n = 0) 
Studies included in review 
(n = 207) 
Reports of included studies 
(n = 207) 
Identification of studies via databases and registers Identification of studies via other methods 
Id
e
n
ti
fi
c
a
ti
o
n
 
S
c
re
e
n
in
g
 
 
In
c
lu
d
e
d
 
Reports sought for retrieval 
(n = 60) 
Reports not retrieved 
(n = 0) 
Fig. 2. Flow Diagram of PRISMA. Figure adapted from [26].
was executed in the databases. Hence, our review comprises
more than the last six years. We considered papers discussing
RS in EO, namely RS sensors mounted to aerial vehicles
and satellites. We included all the available data from these
sensors, as well as high-level and compound products, such as
Digital Elevation Models (DEMs) and the ERA5 dataset [29].
By contrast, the filtering or exclusion criteria we followed can
be summarized as follows:
• EC1 - Publication unrelated to explainable AI
• EC2 - Publication unrelated to Remote Sensing
• EC3 - Review/survey/short conference paper
• EC4 - Publication published before 2017
• EC5 - Publication not written in English
The covered work discusses the interpretability of the method
or the explanation results. Hence, it is not enough to only
use a model which is interpretable by itself, like a decision
tree or linear regression (EC1). We explicitly excluded in-situ
measurements and pictures actively taken in the surrounding
environment, which are non-RS products, as well as papers
out of the scope of RS and xAI (EC2). Furthermore, review
and survey papers were excluded (EC3) since they would be
included in the related work section. Also, short conference
papers that do not exceed a minimum of 5 pages are excluded
(EC3) because they usually contain preliminary or incomplete
results. The results of the search were filtered in three steps:
(i) removing duplicates, conference abstracts, and reports, (ii)
screening through the abstract, and (iii) screening through
the full text. At least one author read each abstract and
paper; whenever there was doubt, the other authors read the
corresponding paper as well. In total, our search results in 1075
papers, merging the different sources and removing duplicates,
left us with 964 papers. In the first shallow abstract screening,
we removed 607 papers since they were unrelated to our
review topic. Three papers had to be excluded due to a lack of
access rights. After the full-text screening, we were left with
147 papers. These papers were accompanied by 60 papers we
had in our libraries. The procedure is summarized in Figure
2. Reasons for removing a specific paper can be found in the
supplementary material.
IV. EXPLANATION METHODS IN ML
This chapter provides a general overview of xAI. We first
present the taxonomy used to describe common distinctions
between explanation methods. Then, we introduce our catego-
rization of xAI methods. Subsequently, we describe in detail
a selection of the methods commonly used in the field of RS.
Finally, we give an overview of the different metrics proposed
in the literature for evaluating these methods and present the
main objectives of using xAI.
There exist several terms for explainable AI, such as inter-
pretable Machine Learning (IML) and interpretable AI. These
terms often refer to the same concept: the explanation or
interpretation of AI models [30]. Hence, we will use them
interchangeably in this paper.
In the literature, three common distinctions exist when cate-
gorizing xAI methods: ante-hoc vs. post-hoc, model-agnostic
vs. model-specific, and local vs. global [12, 18, 31, 32]. The
ante-hoc and post-hoc taxonomy refer to the stage where the
explanation is generated. A xAI method that provides inter-
pretations within or simultaneously with the training process
is called ante-hoc. In contrast, a post-hoc method explains
the model after the training phase using a separate algorithm
that could even be distinct from the one followed by the
model. Model-agnostic methods have the ability to generate
explanations for any model. The method does not access the
model’s internal state or parameters, and the explanations are
created by analyzing the changes in the model’s output when
modifying its inputs. Model-specific methods are exclusively
designed for specific architectures and typically have access to
the model’s inner workings. Local methods explain individual
instances and the model’s behavior at a particular sample. In
contrast, global methods explain the model’s behavior on the
entire dataset. In practice, local explanations can be leveraged
to achieve global explanations. Through aggregation over a
set of input instances chosen to represent the dataset, local
explanations can provide insights into the general behavior of
the inspected model. The aggregation mechanism should be
carefully defined since straightforward aggregation on some
xAI methods might lead to erroneous results. Some researchers
further investigated the question of finding meaningful ag-
gregation rules of local explanations [33, 34]. The methods
presented in this review are categorized into the three universal
categories introduced above, as shown in Figure 3.
A. xAI methods categorization
In this section, we introduce a categorization of the most
important xAI methods in the literature. We build on the cate-
gorical foundation of [10] and further adjust these categories to
capture a larger set of xAI methods. The categories are struc-
tured in a tree-like design. The tree has two internal layers,
which describe a hierarchy of primary and secondary cate-
gories and a leaf layer which indicates an individual or group
of specific methods. In the following, a high-level overview
of the categories of xAI methods is explained following the
structure of Figure 3. Our four primary categories are feature
attribution, distillation, intrinsic explanations, and contrastive
examples. Feature attribution methods highlight the input
features, significantly influencing the output. Alternatively,
distillation builds a new interpretable model from the behavior
of the complex model. Intrinsic methods focus on making the
model itself or its components inherently interpretable. Lastly,
contrastive examples concentrate on showing simulated or real
examples and allow an explanation by comparing them. Each
of these categories splits into 2 to 3 subcategories.
For the sake of completeness, it should be noted that feature
selection methods are not considered in our categorization and
are listed separately in the results. Even though they are related
to feature interpretation, they serve different purposes. Feature
selection can be defined as a strategy to reduce the dimension-
ality of input space to improve the model performance and
reduce its computational coston 09/06/2022).
[214] S. Biass, S. Jenkins, W. Aeberhard, P. Delmelle, and
T. Wilson, “Insights into the vulnerability of vegetation
to tephra fallouts from interpretable machine learning
and big Earth observation data,” Natural Hazards and
Earth System Sciences, vol. 22, no. 9, pp. 2829–2855,
2022. DOI: 10.5194/nhess-22-2829-2022.
[215] T. T. Stomberg, T. Stone, J. Leonhardt, and R. Roscher,
“Exploring Wilderness Using Explainable Machine
Learning in Satellite Imagery,” arXiv:2203.00379 [cs],
Mar. 2022. arXiv: 2203 . 00379 [cs]. (visited on
03/08/2022).
[216] S. J. Silva, C. A. Keller, and J. Hardin, “Using an
Explainable Machine Learning Approach to Charac-
terize Earth System Model Errors: Application of
SHAP Analysis to Modeling Lightning Flash Oc-
currence,” Journal of Advances in Modeling Earth
Systems, vol. 14, no. 4, e2021MS002881, 2022, ISSN:
1942-2466. DOI: 10.1029/2021MS002881. (visited on
01/05/2023).
[217] B. Hosseiny, A. Abdi, and S. Jamali, “Urban land
use and land cover classification with interpretable
machine learning – A case study using Sentinel-2 and
auxiliary data,” Remote Sensing Applications: Society
and Environment, vol. 28, 2022. DOI: 10.1016/j.rsase.
2022.100843.
https://doi.org/10.1109/LGRS.2023.3321770
https://doi.org/10.1175/AIES-D-22-0001.1
https://doi.org/10.1109/LGRS.2021.3132920
https://doi.org/10.1080/13658816.2022.2131789
https://doi.org/10.1109/JSTARS.2023.3312521
https://doi.org/10.1109/JSTARS.2023.3312521
https://doi.org/10.1038/s42256-020-00243-5
https://doi.org/10.1038/s42256-020-00243-5
https://doi.org/10.1057/s41599-023-01505-4
https://doi.org/10.1057/s41599-023-01505-4
https://doi.org/10.1109/JSTARS.2016.2641583
https://doi.org/10.1007/s10115-023-01959-7
https://doi.org/10.1007/s10115-023-01959-7
https://doi.org/10.1016/j.jag.2021.102520
https://doi.org/10.1016/j.jag.2021.102520
https://doi.org/10.1109/ACCESS.2022.3219879
https://doi.org/10.1109/ACCESS.2022.3219879
https://doi.org/10.1145/3567430
https://doi.org/10.1145/3567430
https://doi.org/10.1109/JSTARS.2022.3188493
https://doi.org/10.1109/JSTARS.2022.3188493
https://doi.org/10.3390/s21144738
https://doi.org/10.3390/s21144738
https://doi.org/10.1029/2022GL099368
https://doi.org/10.5194/nhess-22-2829-2022
https://arxiv.org/abs/2203.00379
https://doi.org/10.1029/2021MS002881
https://doi.org/10.1016/j.rsase.2022.100843
https://doi.org/10.1016/j.rsase.2022.100843
[218] J. Gawlikowski, P. Ebel, M. Schmitt, and X. X. Zhu,
“Explaining the Effects of Clouds on Remote Sensing
Scene Classification,” IEEE Journal of Selected Topics
in Applied Earth Observations and Remote Sensing,
vol. 15, pp. 9976–9986, 2022, ISSN: 2151-1535. DOI:
10.1109/JSTARS.2022.3221788.
[219] N. Rampal et al., “High-resolution downscaling with
interpretable deep learning: Rainfall extremes over
New Zealand,” Weather and Climate Extremes, vol. 38,
p. 100 525, Dec. 2022, ISSN: 2212-0947. DOI: 10 .
1016/j.wace.2022.100525. (visited on 03/05/2023).
[220] P. W. Keys, E. A. Barnes, and N. H. Carter, “A
machine-learning approach to human footprint index
estimation with applications to sustainable develop-
ment,” Environmental Research Letters, vol. 16, no. 4,
p. 044 061, Apr. 2021, ISSN: 1748-9326. DOI: 10.1088/
1748-9326/abe00a. (visited on 01/27/2023).
[221] T. Beker, H. Ansari, S. Montazeri, Q. Song, and X. X.
Zhu, “Deep learning for subtle volcanic deformation
detection with InSAR data in central volcanic zone,”
IEEE Transactions on Geoscience and Remote Sens-
ing, vol. 61, pp. 1–20, 2023, ISSN: 1558-0644. DOI:
10.1109/TGRS.2023.3318469.
[222] Z. Xiong, S. Chen, Y. Shi, and X. X. Zhu, Disentan-
gled Latent Transformer for Interpretable Monocular
Height Estimation, Feb. 2022. DOI: 10.48550/arXiv.
2201 . 06357. arXiv: 2201 . 06357 [cs]. (visited on
08/27/2022).
[223] M. Iatrou, C. Karydas, X. Tseni, and S. Mourelatos,
“Representation learning with a variational autoen-
coder for predicting nitrogen requirement in rice,” Re-
mote Sensing, vol. 14, no. 23, 2022, ISSN: 20724292.
DOI: 10.3390/rs14235978.
[224] N. Luo et al., “Explainable and spatial dependence
deep learning model for satellite-based O3 monitoring
in China,” Atmospheric Environment, p. 119 370, Sep.
2022, ISSN: 13522310. DOI: 10.1016/j.atmosenv.2022.
119370. (visited on 09/12/2022).
[225] J. J. Valdés and A. Pou, “Explainable AI applied to
the analysis of the climatic behavior of 11 years of
meteosat water vapor images,” in 2022 IEEE Sym-
posium Series on Computational Intelligence (SSCI),
Dec. 2022, pp. 846–853. DOI: 10.1109/SSCI51031.
2022.10022301.
[226] X. Yan, Z. Zang, N. Luo, Y. Jiang, and Z. Li, “New
interpretable deep learning model to monitor real-time
PM2.5 concentrations from satellite data,” Environ-
ment International, vol. 144, 2020. DOI: 10.1016/ j .
envint.2020.106060.
[227] X. Yan, Z. Zang, C. Zhao, and L. Husi, “Under-
standing global changes in fine-mode aerosols during
2008–2017 using statistical methods and deep learning
approach,” Environment International, vol. 149, 2021.
DOI: 10.1016/j.envint.2021.106392.
[228] P. Maranzano, P. Otto, and A. Fassò, “Adaptive
LASSO estimation for functional hidden dynamic geo-
statistical models,” Stochastic Environmental Research
and Risk Assessment, vol. 37, no. 9, pp. 3615–3637,
Sep. 2023, ISSN: 1436-3259. DOI: 10.1007/s00477-
023-02466-5.
[229] Z. Ebrahimi-Khusfi et al., “Determining the contri-
bution of environmental factors in controlling dust
pollution during cold and warm months of western
Iran using different data mining algorithms and game
theory,” Ecological Indicators, vol. 132, p. 108 287,
Dec. 2021, ISSN: 1470-160X. DOI: 10.1016/j.ecolind.
2021.108287. (visited on 07/05/2022).
[230] Y. Shao et al., “Estimation of daily NO2 with explain-
able machine learning model in China, 2007–2020,”
Atmospheric Environment, vol. 314, 2023, ISSN:
13522310. DOI: 10.1016/j.atmosenv.2023.120111.
[231] S. Wang, P. Wang, R. Zhang, X. Meng, H. Kan, and
H. Zhang, “Estimating particulate matter concentra-
tions and meteorological contributions in China dur-
ing 2000–2020,” Chemosphere, vol. 330, 2023, ISSN:
00456535. DOI: 10.1016/j.chemosphere.2023.138742.
[232] L. Zipfel, H. Andersen, and J. Cermak, “Machine-
learning based analysis of liquid water path adjust-
ments to aerosol perturbations in marine boundary
layer clouds using satellite observations,” Atmosphere,
vol. 13, no. 4, 2022. DOI: 10.3390/atmos13040586.
[233] L. Liang, J. Daniels, C. Bailey, L. Hu, R. Phillips,
and J. South, “Integrating low-cost sensor monitoring,
satellite mapping, and geospatial artificial intelligence
for intra-urban air pollution predictions,” Environmen-
tal Pollution, vol. 331, 2023, ISSN: 02697491. DOI:
10.1016/j.envpol.2023.121832.
[234] Y. Zhen and G. Shi, “Evaluation of MACC total
aerosol optical depth and its correction model based on
the random forest regression,” Theoretical and Applied
Climatology, vol. 152, no. 3-4, pp. 1243–1258, 2023,
ISSN: 0177798X. DOI: 10.1007/s00704-023-04455-8.
[235] J. J. Valdés and A. Pou, “A machine learning -
explainable AI approach to tropospheric dynamics
analysis using water vapor meteosat images,” in 2021
IEEE Symposium Series on Computational Intelligence
(SSCI), Dec. 2021, pp. 1–8. DOI: 10.1109/SSCI50451.
2021.9660188.
[236] S. Zhang et al., “A data-augmentation approach to de-
riving long-term surface SO2 across Northern China:
Implications for interpretable machine learning,” Sci-
ence of the Total Environment, vol. 827, 2022. DOI:
10.1016/j.scitotenv.2022.154278.
[237] C.-S. Cheng, A. Behzadan, and A. Noshadra-
van, “Uncertainty-aware convolutional neural network
for explainable artificial intelligence-assisted disaster
damage assessment,” Structural Control and Health
Monitoring, vol. 29, no. 10, 2022. DOI: 10.1002/stc.
3019.
[238] E. Hasanpour Zaryabi, L. Moradi, B. Kalantar, N.
Ueda, and A. A. Halin, “Unboxing the black box of
attention mechanisms in remote sensing big data using
https://doi.org/10.1109/JSTARS.2022.3221788
https://doi.org/10.1016/j.wace.2022.100525https://doi.org/10.1016/j.wace.2022.100525
https://doi.org/10.1088/1748-9326/abe00a
https://doi.org/10.1088/1748-9326/abe00a
https://doi.org/10.1109/TGRS.2023.3318469
https://doi.org/10.48550/arXiv.2201.06357
https://doi.org/10.48550/arXiv.2201.06357
https://arxiv.org/abs/2201.06357
https://doi.org/10.3390/rs14235978
https://doi.org/10.1016/j.atmosenv.2022.119370
https://doi.org/10.1016/j.atmosenv.2022.119370
https://doi.org/10.1109/SSCI51031.2022.10022301
https://doi.org/10.1109/SSCI51031.2022.10022301
https://doi.org/10.1016/j.envint.2020.106060
https://doi.org/10.1016/j.envint.2020.106060
https://doi.org/10.1016/j.envint.2021.106392
https://doi.org/10.1007/s00477-023-02466-5
https://doi.org/10.1007/s00477-023-02466-5
https://doi.org/10.1016/j.ecolind.2021.108287
https://doi.org/10.1016/j.ecolind.2021.108287
https://doi.org/10.1016/j.atmosenv.2023.120111
https://doi.org/10.1016/j.chemosphere.2023.138742
https://doi.org/10.3390/atmos13040586
https://doi.org/10.1016/j.envpol.2023.121832
https://doi.org/10.1007/s00704-023-04455-8
https://doi.org/10.1109/SSCI50451.2021.9660188
https://doi.org/10.1109/SSCI50451.2021.9660188
https://doi.org/10.1016/j.scitotenv.2022.154278
https://doi.org/10.1002/stc.3019
https://doi.org/10.1002/stc.3019
XAI,” Remote Sensing, vol. 14, no. 24, 2022, ISSN:
20724292. DOI: 10.3390/rs14246254.
[239] S. T. Seydi, M. Hasanlou, J. Chanussot, and P.
Ghamisi, “BDD-Net+: A building damage detection
framework based on modified coat-net,” IEEE Journal
of Selected Topics in Applied Earth Observations and
Remote Sensing, vol. 16, pp. 4232–4247, 2023, ISSN:
2151-1535. DOI: 10.1109/JSTARS.2023.3267847.
[240] S. T. Seydi, M. Hasanlou, J. Chanussot, and P.
Ghamisi, “Leveraging involution and convolution in
an explainable building damage detection framework,”
European Journal of Remote Sensing, vol. 0, no. 0,
p. 2 252 166, 2023, ISSN: null. DOI: 10 . 1080 /
22797254.2023.2252166. (visited on 09/29/2023).
[241] A. Descals, A. Verger, G. Yin, I. Filella, and J.
Peñuelas, “Local interpretation of machine learning
models in remote sensing with SHAP: The case of
global climate constraints on photosynthesis phenol-
ogy,” International Journal of Remote Sensing, vol. 44,
no. 10, pp. 3160–3173, 2023, ISSN: 01431161. DOI:
10.1080/01431161.2023.2217982.
[242] L. Li, Z. Zeng, G. Zhang, K. Duan, B. Liu, and
X. Cai, “Exploring the individualized effect of cli-
matic drivers on MODIS net primary productivity
through an explainable machine learning framework,”
Remote Sensing, vol. 14, no. 17, 2022. DOI: 10.3390/
rs14174401.
[243] S. Zhu et al., “Explainable machine learning confirms
the global terrestrial CO2 fertilization effect from
space,” IEEE Geoscience and Remote Sensing Letters,
vol. 20, pp. 1–5, 2023, ISSN: 1558-0571. DOI: 10 .
1109/LGRS.2023.3298373.
[244] S. Scepanovic, S. Joglekar, S. Law, and D. Quercia,
“Jane Jacobs in the Sky: Predicting Urban Vitality
with Open Satellite Data,” Proceedings of the ACM
on Human-Computer Interaction, vol. 5, no. CSCW1,
pp. 1–25, Apr. 2021, ISSN: 2573-0142. DOI: 10.1145/
3449257. (visited on 01/04/2024).
[245] L. Chen et al., “Measuring Impacts of Urban Environ-
mental Elements on Housing Prices Based on Multi-
source Data—A Case Study of Shanghai, China,” IS-
PRS International Journal of Geo-Information, vol. 9,
no. 2, p. 106, Feb. 2020, ISSN: 2220-9964. DOI: 10.
3390/ijgi9020106. (visited on 07/05/2022).
[246] Z. Ding et al., “Residential greenness and cardiac
conduction abnormalities: Epidemiological evidence
and an explainable machine learning modeling study,”
Chemosphere, vol. 339, 2023, ISSN: 00456535. DOI:
10.1016/j.chemosphere.2023.139671.
[247] M. Kim and G. Kim, “Modeling and predicting ur-
ban expansion in south korea using explainable ar-
tificial intelligence (XAI) model,” Applied Sciences
(Switzerland), vol. 12, no. 18, 2022. DOI: 10 .3390/
app12189169.
[248] S. Scepanovic et al., “MedSat: A Public Health Dataset
for England Featuring Medical Prescriptions and Satel-
lite Imagery,” in Thirty-Seventh Conference on Neural
Information Processing Systems Datasets and Bench-
marks Track, Nov. 2023. (visited on 01/04/2024).
[249] A. Temenos, I. Tzortzis, M. Kaselimi, I. Rallis, A.
Doulamis, and N. Doulamis, “Novel insights in spatial
epidemiology utilizing explainable AI (XAI) and re-
mote sensing,” Remote Sensing, vol. 14, no. 13, 2022.
DOI: 10.3390/rs14133074.
[250] S. Wang et al., “Unpacking the inter- and intra-
urban differences of the association between health
and exposure to heat and air quality in Australia using
global and local machine learning models,” Science
of The Total Environment, vol. 871, p. 162 005, May
2023, ISSN: 00489697. DOI: 10.1016/j.scitotenv.2023.
162005. (visited on 02/14/2023).
[251] P. Wójcik and K. Andruszek, “Predicting intra-urban
well-being from space with nonlinear machine learn-
ing,” Regional Science Policy and Practice, vol. 14,
no. 4, pp. 891–913, 2022. DOI: 10.1111/rsp3.12478.
[252] S. S. Prasad, R. C. Deo, S. Salcedo-Sanz, N. J.
Downs, D. Casillas-Pérez, and A. V. Parisi, “Enhanced
joint hybrid deep neural network explainable artificial
intelligence model for 1-hr ahead solar ultraviolet
index prediction,” Computer Methods and Programs
in Biomedicine, vol. 241, 2023, ISSN: 01692607. DOI:
10.1016/j.cmpb.2023.107737.
[253] K. Blix, A. B. Ruescas, J. E. Johnson, and G. Camps-
Valls, “Learning relevant features of optical water
types,” IEEE Geoscience and Remote Sensing Letters,
vol. 19, pp. 1–5, 2022, ISSN: 1558-0571. DOI: 10 .
1109/LGRS.2021.3072049.
[254] L. Chen et al., “Towards transparent deep learning
for surface water detection from SAR imagery,” In-
ternational Journal of Applied Earth Observation and
Geoinformation, vol. 118, 2023, ISSN: 15698432. DOI:
10.1016/j.jag.2023.103287.
[255] S. Hong et al., “Monitoring the vertical distribution of
HABs using hyperspectral imagery and deep learning
models,” Science of the Total Environment, vol. 794,
2021. DOI: 10.1016/j.scitotenv.2021.148592.
[256] Y. Liu, K. Duffy, J. G. Dy, and A. R. Ganguly,
“Explainable deep learning for insights in El Niño and
river flows,” Nature Communications, vol. 14, no. 1,
2023, ISSN: 20411723. DOI: 10 . 1038 / s41467 - 023 -
35968-5.
[257] N. Rampal, T. Shand, A. Wooler, and C. Raut-
enbach, “Interpretable deep learning applied to rip
current detection and localization,” Remote Sensing,
vol. 14, no. 23, 2022, ISSN: 20724292. DOI: 10.3390/
rs14236048.
[258] E. O. Yilmaz, H. Tonbul, and T. Kavzoglu, “Marine
mucilage mapping with explained deep learning model
using water-related spectral indices: A case study of
Dardanelles Strait, Turkey,” Stochastic Environmental
Research and Risk Assessment, 2023, ISSN: 14363240.
DOI: 10.1007/s00477-023-02560-8.
https://doi.org/10.3390/rs14246254
https://doi.org/10.1109/JSTARS.2023.3267847
https://doi.org/10.1080/22797254.2023.2252166
https://doi.org/10.1080/22797254.2023.2252166
https://doi.org/10.1080/01431161.2023.2217982
https://doi.org/10.3390/rs14174401
https://doi.org/10.3390/rs14174401
https://doi.org/10.1109/LGRS.2023.3298373
https://doi.org/10.1109/LGRS.2023.3298373
https://doi.org/10.1145/3449257
https://doi.org/10.1145/3449257
https://doi.org/10.3390/ijgi9020106
https://doi.org/10.3390/ijgi9020106
https://doi.org/10.1016/j.chemosphere.2023.139671
https://doi.org/10.3390/app12189169
https://doi.org/10.3390/app12189169
https://doi.org/10.3390/rs14133074
https://doi.org/10.1016/j.scitotenv.2023.162005
https://doi.org/10.1016/j.scitotenv.2023.162005
https://doi.org/10.1111/rsp3.12478
https://doi.org/10.1016/j.cmpb.2023.107737
https://doi.org/10.1109/LGRS.2021.3072049
https://doi.org/10.1109/LGRS.2021.3072049
https://doi.org/10.1016/j.jag.2023.103287
https://doi.org/10.1016/j.scitotenv.2021.148592
https://doi.org/10.1038/s41467-023-35968-5
https://doi.org/10.1038/s41467-023-35968-5
https://doi.org/10.3390/rs14236048
https://doi.org/10.3390/rs14236048
https://doi.org/10.1007/s00477-023-02560-8
[259] X. Jing, J. Luo, G. Zuo, and X. Yang, “Interpreting
runoff forecastingof long short-term memory network:
An investigation using the integrated gradient method
on runoff data from the Han River Basin,” Journal
of Hydrology: Regional Studies, vol. 50, p. 101 549,
Dec. 2023, ISSN: 2214-5818. DOI: 10 . 1016 / j . ejrh .
2023.101549. (visited on 10/16/2023).
[260] J. Pyo, K. Cho, K. Kim, S.-S. Baek, G. Nam, and
S. Park, “Cyanobacteria cell prediction using inter-
pretable deep learning model with observed, numer-
ical, and sensing data assemblage,” Water Research,
vol. 203, 2021. DOI: 10.1016/j.watres.2021.117483.
[261] Z. Hu et al., “Water storage changes (2003–2020) in
the Ordos Basin, China, explained by GRACE data and
interpretable deep learning,” Hydrogeology Journal,
2023, ISSN: 14312174. DOI: 10 . 1007 / s10040 - 023 -
02713-7.
[262] V. Saeidi, S. T. Seydi, B. Kalantar, N. Ueda, B. Taj-
firooz, and F. Shabani, “Water depth estimation from
Sentinel-2 imagery using advanced machine learning
methods and explainable artificial intelligence,” Ge-
omatics, Natural Hazards and Risk, vol. 14, no. 1,
2023, ISSN: 19475705. DOI: 10.1080/19475705.2023.
2225691.
[263] S.-C. Hung, H.-C. Wu, and M.-H. Tseng, “Integrating
image quality enhancement methods and deep learning
techniques for remote sensing scene classification,”
Applied Sciences (Switzerland), vol. 11, no. 24, 2021.
DOI: 10.3390/app112411659.
[264] M. Jeon, T. Kim, S. Kim, C. Lee, and C.-H. Youn, “Re-
cursive visual explanations mediation scheme based
on DropAttention model with multiple episodes pool,”
IEEE access : practical innovations, open solutions,
vol. 11, pp. 4306–4321, 2023, ISSN: 21693536. DOI:
10.1109/ACCESS.2023.3235332.
[265] J. Cheng, F. Zhang, D. Xiang, Q. Yin, Y. Zhou, and W.
Wang, “PolSAR image land cover classification based
on hierarchical capsule network,” Remote Sensing,
vol. 13, no. 16, 2021, ISSN: 20724292. DOI: 10.3390/
rs13163132.
[266] D. Guidici and M. L. Clark, “One-Dimensional Con-
volutional Neural Network Land-Cover Classification
of Multi-Seasonal Hyperspectral Imagery in the San
Francisco Bay Area, California,” Remote Sensing,
vol. 9, no. 6, p. 629, Jun. 2017, ISSN: 2072-4292. DOI:
10.3390/rs9060629. (visited on 07/05/2022).
[267] D. Ienco, Y. J. E. Gbodjo, R. Gaetano, and R. Inter-
donato, “Weakly supervised learning for land cover
mapping of satellite image time series via attention-
based CNN,” IEEE access : practical innovations,
open solutions, vol. 8, pp. 179 547–179 560, 2020,
ISSN: 2169-3536. DOI: 10 . 1109 / ACCESS . 2020 .
3024133.
[268] N. Méger, H. Courteille, A. Benoit, A. Atto, and
D. Ienco, “Explaining a Deep SpatioTemporal Land
Cover Classifier with Attention and Redescription
Mining,” ser. International Archives of the Photogram-
metry, Remote Sensing and Spatial Information Sci-
ences - ISPRS Archives, vol. 43, 2022, pp. 673–680.
DOI: 10 .5194/ isprs - archives - XLIII - B3- 2022- 673-
2022.
[269] J. Feng, D. Wang, and Z. Gu, “Bidirectional flow
decision tree for reliable remote sensing image scene
classification,” Remote Sensing, vol. 14, no. 16, 2022.
DOI: 10.3390/rs14163943.
[270] Z. Huang, X. Yao, Y. Liu, C. Dumitru, M. Datcu, and
J. Han, “Physically explainable CNN for SAR image
classification,” ISPRS Journal of Photogrammetry and
Remote Sensing, vol. 190, pp. 25–37, 2022. DOI: 10.
1016/j.isprsjprs.2022.05.008.
[271] S. Ma et al., “Multicrop Fusion Strategy Based on Pro-
totype Assignment for Remote Sensing Image Scene
Classification,” IEEE Transactions on Geoscience and
Remote Sensing, vol. 60, pp. 1–12, 2022, ISSN: 1558-
0644. DOI: 10.1109/TGRS.2022.3216831.
[272] M. Verma, N. Gupta, B. Tolani, and R. Kaushal,
“Explainable custom CNN architecture for land use
classification using satellite images,” in 2021 Sixth
International Conference on Image Information Pro-
cessing (ICIIP), vol. 6, Nov. 2021, pp. 304–309. DOI:
10.1109/ICIIP53038.2021.9702698.
[273] M. C. Iban and S. S. Bilgilioglu, “Snow avalanche
susceptibility mapping using novel tree-based ma-
chine learning algorithms (XGBoost, NGBoost, and
LightGBM) with eXplainable Artificial Intelligence
(XAI) approach,” Stochastic Environmental Research
and Risk Assessment, 2023. DOI: 10 . 1007 / s00477 -
023-02392-6.
[274] F. Huang et al., “Interpreting conv-LSTM for spatio-
temporal soil moisture prediction in china,” Agri-
culture (Switzerland), vol. 13, no. 5, 2023, ISSN:
20770472. DOI: 10.3390/agriculture13050971.
[275] R. I. Mukhamediev et al., “Soil salinity estimation
for south kazakhstan based on SAR sentinel-1 and
Landsat-8,9 OLI data with machine learning mod-
els,” Remote Sensing, vol. 15, no. 17, 2023, ISSN:
20724292. DOI: 10.3390/rs15174269.
[276] B. Pradhan et al., “A new method to evaluate gold
mineralisation-potential mapping using deep learning
and an explainable artificial intelligence (XAI) model,”
Remote Sensing, vol. 14, no. 18, 2022. DOI: 10.3390/
rs14184486.
[277] Y. Zhou, W. Wu, H. Wang, X. Zhang, C. Yang, and
H. Liu, “Identification of soil texture classes under
vegetation cover based on sentinel-2 data with SVM
and SHAP techniques,” IEEE Journal of Selected
Topics in Applied Earth Observations and Remote
Sensing, vol. 15, pp. 3758–3770, 2022, ISSN: 2151-
1535. DOI: 10.1109/JSTARS.2022.3164140.
[278] I. A. Smorkalov, “Soil respiration variability: Contri-
butions of space and time estimated using the random
forest algorithm,” Russian Journal of Ecology, vol. 53,
https://doi.org/10.1016/j.ejrh.2023.101549
https://doi.org/10.1016/j.ejrh.2023.101549
https://doi.org/10.1016/j.watres.2021.117483
https://doi.org/10.1007/s10040-023-02713-7
https://doi.org/10.1007/s10040-023-02713-7
https://doi.org/10.1080/19475705.2023.2225691
https://doi.org/10.1080/19475705.2023.2225691
https://doi.org/10.3390/app112411659
https://doi.org/10.1109/ACCESS.2023.3235332
https://doi.org/10.3390/rs13163132
https://doi.org/10.3390/rs13163132
https://doi.org/10.3390/rs9060629
https://doi.org/10.1109/ACCESS.2020.3024133
https://doi.org/10.1109/ACCESS.2020.3024133
https://doi.org/10.5194/isprs-archives-XLIII-B3-2022-673-2022
https://doi.org/10.5194/isprs-archives-XLIII-B3-2022-673-2022
https://doi.org/10.3390/rs14163943
https://doi.org/10.1016/j.isprsjprs.2022.05.008
https://doi.org/10.1016/j.isprsjprs.2022.05.008
https://doi.org/10.1109/TGRS.2022.3216831
https://doi.org/10.1109/ICIIP53038.2021.9702698
https://doi.org/10.1007/s00477-023-02392-6
https://doi.org/10.1007/s00477-023-02392-6
https://doi.org/10.3390/agriculture13050971
https://doi.org/10.3390/rs15174269
https://doi.org/10.3390/rs14184486
https://doi.org/10.3390/rs14184486
https://doi.org/10.1109/JSTARS.2022.3164140
no. 4, pp. 295–307, Aug. 2022, ISSN: 1608-3334. DOI:
10.1134/S1067413622040051.
[279] Z. Labe and E. Barnes, “Predicting slowdowns in
decadal climate warming trends with explainable neu-
ral networks,” Geophysical Research Letters, vol. 49,
no. 9, 2022. DOI: 10.1029/2022GL098173.
[280] Y. Hu, C. Wu, M. E. Meadows, and M. Feng, “Pixel
level spatial variability modeling using SHAP reveals
the relative importance of factors influencing LST,”
Environmental Monitoring and Assessment, vol. 195,
no. 3, p. 407, Feb. 2023, ISSN: 1573-2959. DOI: 10.
1007/s10661-023-10950-2. (visited on 02/21/2023).
[281] M. Kim, D. Kim, and G. Kim, “Examining the re-
lationship between land Use/Land cover (LULC) and
land surface temperature (LST) using explainable arti-
ficial intelligence (XAI) models: A case study of seoul,
south korea,” International Journal of Environmental
Research and Public Health, vol. 19, no. 23, 2022,
ISSN: 16617827. DOI: 10.3390/ijerph192315926.
[282] P. Luo, B. Yu, P. Li, P. Liang, Q. Zhang, and L.
Yang, “Understanding the relationship between 2D/3D
variables and land surface temperature in plain and
mountainous cities: Relative importance and interac-
tion effects,” Building and Environment, p. 110 959,
Oct. 2023, ISSN: 0360-1323. DOI: 10.1016/j.buildenv.
2023.110959. (visited on 10/26/2023).
[283] Y. Shen et al., “Using GeoAI to reveal the contribution
of urban park green space features to mitigate the
heat islandeffect,” ser. Proceedings of the International
Conference on Education and Research in Computer
Aided Architectural Design in Europe, vol. 2, 2023,
pp. 49–58, ISBN: 978-94-91207-35-8.
[284] Z. Li, L. Chen, F. Long, Z. Li, and H. Jina, “Automatic
bridge detection of SAR images based on interpretable
deep learning algorithm,” ser. Journal of Physics: Con-
ference Series, vol. 2562, 2023. DOI: 10.1088/1742-
6596/2562/1/012013.
[285] R. Luo et al., “Glassboxing deep learning to enhance
aircraft detection from sar imagery,” Remote Sensing,
vol. 13, no. 18, 2021, ISSN: 20724292. DOI: 10.3390/
rs13183650.
[286] S. Yin, L. Wang, M. Shafiq, L. Teng, A. A. Laghari,
and M. F. Khan, “G2Grad-CAMRL: An object de-
tection and interpretation model based on gradient-
weighted class activation mapping and reinforcement
learning in remote sensing images,” IEEE Journal of
Selected Topics in Applied Earth Observations and
Remote Sensing, pp. 1–16, 2023, ISSN: 2151-1535.
DOI: 10.1109/JSTARS.2023.3241405.
[287] P. Li, C. Feng, X. Hu, and Z. Tang, “SAR-BagNet:
An Ante-hoc Interpretable Recognition Model Based
on Deep Network for SAR Image,” Remote Sensing,
vol. 14, no. 9, p. 2150, Jan. 2022, ISSN: 2072-4292.
DOI: 10.3390/rs14092150. (visited on 03/02/2023).
[288] B. Peng and B. Peng, “Clutter-invariant regularization
for DNN-based SAR target recognition,” in 2023 6th
International Conference on Electronics Technology
(ICET), May 2023, pp. 1456–1461. DOI: 10 . 1109 /
ICET58434.2023.10211978.
[289] M. Zhu, B. Zang, L. Ding, T. Lei, Z. Feng, and J. Fan,
“LIME-Based Data Selection Method for SAR Images
Generation Using GAN,” Remote Sensing, vol. 14,
no. 1, p. 204, Jan. 2022, ISSN: 2072-4292. DOI: 10.
3390/rs14010204. (visited on 08/27/2022).
[290] S. Feng, K. Ji, F. Wang, L. Zhang, X. Ma, and
G. Kuang, “PAN: Part attention network integrating
electromagnetic characteristics for interpretable SAR
vehicle target recognition,” IEEE Transactions on Geo-
science and Remote Sensing, pp. 1–1, 2023, ISSN:
1558-0644. DOI: 10.1109/TGRS.2023.3256399.
[291] D. Guo, B. Chen, M. Zheng, and H. Liu, “SAR
automatic target recognition based on supervised deep
variational autoencoding model,” IEEE Transactions
on Aerospace and Electronic Systems, vol. 57, no. 6,
pp. 4313–4328, Dec. 2021, ISSN: 1557-9603. DOI: 10.
1109/TAES.2021.3096868.
[292] P. Li, X. Hu, C. Feng, X. Shi, Y. Guo, and W. Feng,
“SAR-AD-BagNet: An Interpretable Model for SAR
Image Recognition based on Adversarial Defense,”
IEEE Geoscience and Remote Sensing Letters, pp. 1–1,
2022, ISSN: 1558-0571. DOI: 10 .1109/LGRS.2022.
3230243.
[293] M. Zhou, Z. Zou, Z. Shi, W.-J. Zeng, and J. Gui, “Lo-
cal Attention Networks for Occluded Airplane Detec-
tion in Remote Sensing Images,” IEEE Geoscience and
Remote Sensing Letters, vol. 17, no. 3, pp. 381–385,
Mar. 2020, ISSN: 1558-0571. DOI: 10 . 1109 / LGRS .
2019.2924822.
[294] H. Kawauchi and T. Fuse, “SHAP-Based interpretable
object detection method for satellite imagery,” Re-
mote Sensing, vol. 14, no. 9, 2022. DOI: 10 . 3390 /
rs14091970.
[295] Mandeep, H. S. Pannu, and A. Malhi, “Deep learning-
based explainable target classification for synthetic
aperture radar images,” in 2020 13th International
Conference on Human System Interaction (HSI), Jun.
2020, pp. 34–39. DOI: 10 . 1109 / HSI49210 . 2020 .
9142658.
[296] A. H. Oveis, E. Giusti, S. Ghio, G. Meucci, and M.
Martorella, “LIME-Assisted automatic target recogni-
tion with SAR images: Toward incremental learning
and explainability,” IEEE Journal of Selected Topics
in Applied Earth Observations and Remote Sensing,
vol. 16, pp. 9175–9192, 2023, ISSN: 2151-1535. DOI:
10.1109/JSTARS.2023.3318675.
[297] S. Feng, K. Ji, L. Zhang, X. Ma, and G. Kuang, “SAR
Target Classification Based on Integration of ASC
Parts Model and Deep Learning Algorithm,” IEEE
Journal of Selected Topics in Applied Earth Observa-
tions and Remote Sensing, vol. 14, pp. 10 213–10 225,
2021, ISSN: 2151-1535. DOI: 10.1109/JSTARS.2021.
3116979.
https://doi.org/10.1134/S1067413622040051
https://doi.org/10.1029/2022GL098173
https://doi.org/10.1007/s10661-023-10950-2
https://doi.org/10.1007/s10661-023-10950-2
https://doi.org/10.3390/ijerph192315926
https://doi.org/10.1016/j.buildenv.2023.110959
https://doi.org/10.1016/j.buildenv.2023.110959
https://doi.org/10.1088/1742-6596/2562/1/012013
https://doi.org/10.1088/1742-6596/2562/1/012013
https://doi.org/10.3390/rs13183650
https://doi.org/10.3390/rs13183650
https://doi.org/10.1109/JSTARS.2023.3241405
https://doi.org/10.3390/rs14092150
https://doi.org/10.1109/ICET58434.2023.10211978
https://doi.org/10.1109/ICET58434.2023.10211978
https://doi.org/10.3390/rs14010204
https://doi.org/10.3390/rs14010204
https://doi.org/10.1109/TGRS.2023.3256399
https://doi.org/10.1109/TAES.2021.3096868
https://doi.org/10.1109/TAES.2021.3096868
https://doi.org/10.1109/LGRS.2022.3230243
https://doi.org/10.1109/LGRS.2022.3230243
https://doi.org/10.1109/LGRS.2019.2924822
https://doi.org/10.1109/LGRS.2019.2924822
https://doi.org/10.3390/rs14091970
https://doi.org/10.3390/rs14091970
https://doi.org/10.1109/HSI49210.2020.9142658
https://doi.org/10.1109/HSI49210.2020.9142658
https://doi.org/10.1109/JSTARS.2023.3318675
https://doi.org/10.1109/JSTARS.2021.3116979
https://doi.org/10.1109/JSTARS.2021.3116979
[298] M. Onishi and T. Ise, “Explainable identification and
mapping of trees using UAV RGB image and deep
learning,” Scientific Reports, vol. 11, no. 1, p. 903,
Jan. 2021, ISSN: 2045-2322. DOI: 10 .1038/s41598-
020-79653-9. (visited on 08/27/2022).
[299] Y. Jia, E. Frank, B. Pfahringer, A. Bifet, and N. Lim,
“Studying and exploiting the relationship between
model accuracy and explanation quality,” in Machine
Learning and Knowledge Discovery in Databases. Re-
search Track, Cham: Springer International Publishing,
2021, pp. 699–714, ISBN: 978-3-030-86520-7.
[300] T.-A. Nguyen, B. Kellenberger, and D. Tuia, “Mapping
forest in the Swiss Alps treeline ecotone with explain-
able deep learning,” Remote Sensing of Environment,
vol. 281, 2022. DOI: 10.1016/j.rse.2022.113217.
[301] P. Arrogante-Funes, A. G. Bruzón, A. Álvarez-Ripado,
F. Arrogante-Funes, F. Martı́n-González, and C. J.
Novillo, “Assessment of the regeneration of landslides
areas using unsupervised and supervised methods and
explainable machine learning models,” Landslides,
2023, ISSN: 1612510X. DOI: 10 .1007/s10346- 023-
02154-z.
[302] W. Li et al., “Widespread increasing vegetation sen-
sitivity to soil moisture,” Nature Communications,
vol. 13, no. 1, p. 3959, Jul. 2022, ISSN: 2041-1723.
DOI: 10 . 1038 / s41467 - 022 - 31667 - 9. (visited on
08/27/2022).
[303] C. Zhao, M. Jia, Z. Wang, D. Mao, and Y. Wang,
“Toward a better understanding of coastal salt marsh
mapping: A case from China using dual-temporal
images,” Remote Sensing of Environment, vol. 295,
2023, ISSN: 00344257. DOI: 10 . 1016 / j . rse . 2023 .
113664.
[304] C. Zhao, M. Jia, Z. Wang, D. Mao, and Y. Wang,
“Identifying mangroves through knowledge extracted
from trained random forest models: An interpretable
mangrove mapping approach (IMMA),” ISPRS Jour-
nal of Photogrammetry and Remote Sensing, vol. 201,
pp. 209–225, 2023, ISSN: 09242716. DOI: 10.1016/j.
isprsjprs.2023.05.025.
[305] J. Schiller, C. Jänicke, M. Reckling, and M. Ryo,
“Higher crop diversity in less diverse landscapes,” In
Review, Preprint, Oct. 2023. DOI: 10.21203/rs.3.rs-
3410387/v1. (visited on 10/16/2023).
[306] M. Müller, P.-O. Olsson, L. Eklundh, S. Jamali, and
J. Ardö, “Features predisposing forest to bark beetle
outbreaks and their dynamics during drought,” Forest
Ecology and Management, vol. 523, p. 120 480, Nov.
2022, ISSN: 0378-1127. DOI: 10.1016/j.foreco.2022.
120480. (visited on 09/12/2022).
[307] K. A. Hilburn, I. Ebert-Uphoff, and S. D. Miller,
“Development and Interpretation of a Neural-Network-
Based Synthetic Radar Reflectivity Estimator Using
GOES-R Satellite Observations,” Journal of Applied
Meteorology and Climatology, vol. 60, no. 1, pp. 3–21,
Dec. 2020, ISSN: 1558-8424,1558-8432. DOI: 10 .
1175/JAMC-D-20-0084.1. (visited on 01/11/2023).
[308] Z. K. Martin, E. A. Barnes, and E. Maloney, “Us-
ing Simple, Explainable Neural Networks to Predict
the Madden-Julian Oscillation,” Journal of Advances
in Modeling Earth Systems, vol. 14, no. 5, May
2022, ISSN: 1942-2466, 1942-2466. DOI: 10 . 1029 /
2021MS002774. (visited on 01/30/2023).
[309] K. J. Mayer and E. A. Barnes, “Subseasonal Forecasts
of Opportunity Identified by an Explainable Neu-
ral Network,” Geophysical Research Letters, vol. 48,
no. 10, May 2021, ISSN: 0094-8276, 1944-8007. DOI:
10.1029/2020GL092092. (visited on 01/27/2023).
[310] D. Yang, S. Liu, Y. Hu, X. Liu, J. Xie, and L.
Zhao, “Predictor selection for CNN-based statistical
downscaling of monthly precipitation,” Advances in
Atmospheric Sciences, Feb. 2023, ISSN: 1861-9533.
DOI: 10.1007/s00376-022-2119-x.
[311] S. Upadhyaya, P.-E. Kirstetter, R. Kuligowski, and M.
Searls, “Classifying precipitation from GEO satellite
observations: Diagnostic model,” Quarterly Journal of
the Royal Meteorological Society, vol. 147, no. 739,
pp. 3318–3334, 2021. DOI: 10.1002/qj.4130.
[312] J. Mardian, C. Champagne, B. Bonsal, and A. Berg,
“Understanding the Drivers of Drought Onset and
Intensification in the Canadian Prairies: Insights from
Explainable Artificial Intelligence (XAI),” Journal of
Hydrometeorology, vol. -1, no. aop, Aug. 2023, ISSN:
1525-7541, 1525-755X. DOI: 10 .1175/JHM- D- 23-
0036.1. (visited on 08/29/2023).
[313] N. Antoniadou, H. J. D. Sørup, J. W. Pedersen,
I. B. Gregersen, T. Schmith, and K. Arnbjerg-Nielsen,
“Comparison of data-driven methods for linking ex-
treme precipitation events to local and large-scale me-
teorological variables,” Stochastic Environmental Re-
search and Risk Assessment, vol. 37, no. 11, pp. 4337–
4357, Nov. 2023, ISSN: 1436-3259. DOI: 10 . 1007 /
s00477-023-02511-3.
[314] E. Maddy and S. Boukabara, “MIIDAPS-AI: An ex-
plainable machine-learning algorithm for infrared and
microwave remote sensing and data assimilation pre-
processing - application to LEO and GEO sensors,”
IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, vol. 14, pp. 8566–
8576, 2021. DOI: 10.1109/JSTARS.2021.3104389.
[315] L. Bergamasco, S. Saha, F. Bovolo, and L. Bruz-
zone, “AN EXPLAINABLE CONVOLUTIONAL
AUTOENCODER MODEL for UNSUPERVISED
CHANGE DETECTION,” ser. International Archives
of the Photogrammetry, Remote Sensing and Spatial
Information Sciences - ISPRS Archives, vol. 43, 2020,
pp. 1513–1519. DOI: 10.5194/isprs- archives- XLIII-
B2-2020-1513-2020.
[316] H. Kim, W. Crow, X. Li, W. Wagner, S. Hahn, and V.
Lakshmi, “True global error maps for SMAP, SMOS,
and ASCAT soil moisture data based on machine
https://doi.org/10.1038/s41598-020-79653-9
https://doi.org/10.1038/s41598-020-79653-9
https://doi.org/10.1016/j.rse.2022.113217
https://doi.org/10.1007/s10346-023-02154-z
https://doi.org/10.1007/s10346-023-02154-z
https://doi.org/10.1038/s41467-022-31667-9
https://doi.org/10.1016/j.rse.2023.113664
https://doi.org/10.1016/j.rse.2023.113664
https://doi.org/10.1016/j.isprsjprs.2023.05.025
https://doi.org/10.1016/j.isprsjprs.2023.05.025
https://doi.org/10.21203/rs.3.rs-3410387/v1
https://doi.org/10.21203/rs.3.rs-3410387/v1
https://doi.org/10.1016/j.foreco.2022.120480
https://doi.org/10.1016/j.foreco.2022.120480
https://doi.org/10.1175/JAMC-D-20-0084.1
https://doi.org/10.1175/JAMC-D-20-0084.1
https://doi.org/10.1029/2021MS002774
https://doi.org/10.1029/2021MS002774
https://doi.org/10.1029/2020GL092092
https://doi.org/10.1007/s00376-022-2119-x
https://doi.org/10.1002/qj.4130
https://doi.org/10.1175/JHM-D-23-0036.1
https://doi.org/10.1175/JHM-D-23-0036.1
https://doi.org/10.1007/s00477-023-02511-3
https://doi.org/10.1007/s00477-023-02511-3
https://doi.org/10.1109/JSTARS.2021.3104389
https://doi.org/10.5194/isprs-archives-XLIII-B2-2020-1513-2020
https://doi.org/10.5194/isprs-archives-XLIII-B2-2020-1513-2020
learning and triple collocation analysis,” Remote Sens-
ing of Environment, vol. 298, 2023, ISSN: 00344257.
DOI: 10.1016/j.rse.2023.113776.
[317] D.-S. Lee, D.-Y. Lee, and Y.-S. Park, “Interpretable
machine learning approach to analyze the effects of
landscape and meteorological factors on mosquito
occurrences in Seoul, South Korea,” Environmental
Science and Pollution Research, Jul. 2022, ISSN: 1614-
7499. DOI: 10.1007/s11356-022-22099-5.
[318] F. Wagner et al., “Using explainable machine learning
to understand how urban form shapes sustainable mo-
bility,” Transportation Research Part D: Transport and
Environment, vol. 111, p. 103 442, Oct. 2022, ISSN:
1361-9209. DOI: 10.1016/j.trd.2022.103442. (visited
on 03/07/2023).
[319] P. Taconet et al., “Data-driven and interpretable
machine-learning modeling to explore the fine-scale
environmental determinants of malaria vectors biting
rates in rural Burkina Faso,” Parasites and Vectors,
vol. 14, no. 1, 2021. DOI: 10.1186/s13071-021-04851-
x.
[320] C. Molnar et al., “Pitfalls to avoid when interpret-
ing machine learning models,” in XXAI: Extending
Explainable AI beyond Deep Models and Classifiers,
ICML 2020 Workshop, Jul. 2020.
[321] S. Krishna et al., The Disagreement Problem in Ex-
plainable Machine Learning: A Practitioner’s Per-
spective, Feb. 2022. DOI: 10.48550/arXiv.2202.01602.
arXiv: arXiv:2202.01602. (visited on 10/10/2022).
[322] C. Rudin, “Stop explaining black box machine learn-
ing models for high stakes decisions and use inter-
pretable models instead,” Nature Machine Intelligence,
vol. 1, no. 5, pp. 206–215, May 2019, ISSN: 2522-
5839. DOI: 10.1038/s42256-019-0048-x. (visited on
02/01/2023).
[323] L. Sixt, M. Granz, and T. Landgraf, “When Explana-
tions Lie: Why Many Modified BP Attributions Fail,”
37th International Conference on Machine Learning,
ICML 2020, vol. PartF16814, pp. 8993–9004, Dec.
2020. DOI: 10.48550/arxiv.1912.09818. arXiv: 1912.
09818. (visited on 04/26/2022).
[324] A. Mamalakis, E. A. Barnes, and I. Ebert-Uphoff,
“Investigating the Fidelity of Explainable Artificial In-
telligence Methods for Applications of Convolutional
Neural Networks in Geoscience,” Artificial Intelligence
for the Earth Systems, vol. 1, no. 4, e220012, Oct.
2022, ISSN: 2769-7525. DOI: 10.1175/AIES- D- 22-
0012.1. (visited on 01/27/2023).
[325] P. Bommer, M. Kretschmer, A. Hedström, D. Bareeva,
and M. M.-C. Höhne, Finding the right XAI method
– A Guide for the Evaluation and Ranking of Ex-
plainable AI Methods in Climate Science, Mar. 2023.
DOI: 10.48550/arXiv.2303.00652. arXiv: 2303.00652
[cs]. (visited on 03/04/2023).
[326] J. Haug, S. Zürn, P. El-Jiz, and G. Kasneci, On
Baselines for Local Feature Attributions, Jan. 2021.
DOI: 10.48550/arXiv.2101.00905. arXiv: 2101.00905
[cs]. (visited on 10/26/2023).
[327] A. Mamalakis, E. A. Barnes, and I. Ebert-Uphoff,
“Carefully Choose the Baseline: Lessons Learned from
Applying XAI Attribution Methods for Regression
Tasks in Geoscience,” Artificial Intelligence for the
Earth Systems, vol. 2, no. 1, Jan. 2023, ISSN: 2769-
7525. DOI: 10.1175/AIES-D-22-0058.1. (visited on
03/28/2023).
[328] A. Bertrand, R. Belloum, J. R. Eagan, and W.
Maxwell, “How Cognitive Biases Affect XAI-assisted
Decision-making: A Systematic Review,” in Proceed-
ings of the 2022 AAAI/ACM Conference on AI, Ethics,
and Society, ser. AIES ’22, New York, NY, USA: As-
sociation for Computing Machinery, Jul. 2022, pp. 78–
91, ISBN: 978-1-4503-9247-1. DOI: 10.1145/3514094.
3534164.
[329] R. S. Nickerson, “Confirmation bias: A ubiquitous
phenomenon in many guises,” Review of General
Psychology, vol. 2, no. 2, pp. 175–220, 1998. DOI:
10.1037/1089-2680.2.2.175. eprint: https://doi.org/10.
1037/1089-2680.2.2.175.
[330] M. Langer et al., “What do we want from Explain-
able Artificial Intelligence (XAI)? – A stakeholder
perspective on XAI and a conceptual model guiding
interdisciplinary XAI research,” Artificial Intelligence,
vol. 296, p. 103 473, Jul. 2021, ISSN: 0004-3702. DOI:
10.1016/j.artint.2021.103473. (visited on 05/24/2023).
[331] M. K. Belaid, E.Hüllermeier, M. Rabus, and R.
Krestel, Do We Need Another Explainable AI Method?
Toward Unifying Post-hoc XAI Evaluation Methods
into an Interactive and Multi-dimensional Benchmark,
Oct. 2022. DOI: 10.48550/arXiv.2207.14160. arXiv:
2207.14160 [cs]. (visited on 07/31/2023).
[332] C. Agarwal et al., OpenXAI: Towards a Transparent
Evaluation of Model Explanations, Jan. 2023. DOI: 10.
48550/arXiv.2206.11104. arXiv: 2206.11104 [cs].
(visited on 07/31/2023).
[333] W. Xu, M. J. Dainoff, L. Ge, and Z. Gao, “Transi-
tioning to Human Interaction with AI Systems: New
Challenges and Opportunities for HCI Professionals to
Enable Human-Centered AI,” International Journal of
Human–Computer Interaction, vol. 39, no. 3, pp. 494–
518, Feb. 2023, ISSN: 1044-7318. DOI: 10 . 1080 /
10447318.2022.2041900. (visited on 12/29/2023).
[334] L. Arras, A. Osman, and W. Samek, “CLEVR-XAI:
A benchmark dataset for the ground truth evaluation
of neural network explanations,” Information Fusion,
vol. 81, pp. 14–40, May 2022, ISSN: 1566-2535. DOI:
10.1016/j.inffus.2021.11.008.
[335] A. Mamalakis, I. Ebert-Uphoff, and E. A. Barnes,
“Neural network attribution methods for problems in
geoscience: A novel synthetic benchmark dataset,”
Environmental Data Science, vol. 1, e8, 2022, ISSN:
2634-4602. DOI: 10 . 1017 / eds . 2022 . 7. (visited on
01/27/2023).
https://doi.org/10.1016/j.rse.2023.113776
https://doi.org/10.1007/s11356-022-22099-5
https://doi.org/10.1016/j.trd.2022.103442
https://doi.org/10.1186/s13071-021-04851-x
https://doi.org/10.1186/s13071-021-04851-x
https://doi.org/10.48550/arXiv.2202.01602
https://arxiv.org/abs/arXiv:2202.01602
https://doi.org/10.1038/s42256-019-0048-x
https://doi.org/10.48550/arxiv.1912.09818
https://arxiv.org/abs/1912.09818
https://arxiv.org/abs/1912.09818
https://doi.org/10.1175/AIES-D-22-0012.1
https://doi.org/10.1175/AIES-D-22-0012.1
https://doi.org/10.48550/arXiv.2303.00652
https://arxiv.org/abs/2303.00652
https://arxiv.org/abs/2303.00652
https://doi.org/10.48550/arXiv.2101.00905
https://arxiv.org/abs/2101.00905
https://arxiv.org/abs/2101.00905
https://doi.org/10.1175/AIES-D-22-0058.1
https://doi.org/10.1145/3514094.3534164
https://doi.org/10.1145/3514094.3534164
https://doi.org/10.1037/1089-2680.2.2.175
https://doi.org/10.1037/1089-2680.2.2.175
https://doi.org/10.1037/1089-2680.2.2.175
https://doi.org/10.1016/j.artint.2021.103473
https://doi.org/10.48550/arXiv.2207.14160
https://arxiv.org/abs/2207.14160
https://doi.org/10.48550/arXiv.2206.11104
https://doi.org/10.48550/arXiv.2206.11104
https://arxiv.org/abs/2206.11104
https://doi.org/10.1080/10447318.2022.2041900
https://doi.org/10.1080/10447318.2022.2041900
https://doi.org/10.1016/j.inffus.2021.11.008
https://doi.org/10.1017/eds.2022.7
[336] Z. Labe and E. Barnes, “Detecting climate signals
using explainable AI with single-forcing large ensem-
bles,” Journal of Advances in Modeling Earth Systems,
vol. 13, no. 6, 2021. DOI: 10.1029/2021MS002464.
[337] E. E. Berkson, J. D. VanCor, S. Esposito, G. Chern,
and M. Pritt, “Synthetic Data Generation to Miti-
gate the Low/No-Shot Problem in Machine Learning,”
in 2019 IEEE Applied Imagery Pattern Recognition
Workshop (AIPR), Oct. 2019, pp. 1–7. DOI: 10.1109/
AIPR47015.2019.9174596.
[338] J. Shermeyer, T. Hossler, A. V. Etten, D. Hogan,
R. Lewis, and D. Kim, “RarePlanes: Synthetic Data
Takes Flight,” in 2021 IEEE Winter Conference on
Applications of Computer Vision (WACV), Jan. 2021,
pp. 207–217. DOI: 10.1109/WACV48630.2021.00025.
[339] S. Han et al., “Efficient generation of image chips
for training deep learning algorithms,” in Automatic
Target Recognition XXVII, vol. 10202, SPIE, May
2017, pp. 15–23. DOI: 10.1117/12.2261702. (visited
on 03/27/2023).
[340] T. Hoeser and C. Kuenzer, “SyntEO: Synthetic dataset
generation for earth observation and deep learning –
Demonstrated for offshore wind farm detection,” IS-
PRS Journal of Photogrammetry and Remote Sensing,
vol. 189, pp. 163–184, 2022. DOI: 10.1016/j.isprsjprs.
2022.04.029.
[341] J. Zhao, M. Datcu, Z. Zhang, H. Xiong, and W.
Yu, “Contrastive-regulated CNN in the complex do-
main: A method to learn physical scattering signatures
from flexible PolSAR images,” IEEE Transactions
on Geoscience and Remote Sensing, vol. 57, no. 12,
pp. 10 116–10 135, Dec. 2019, ISSN: 1558-0644. DOI:
10.1109/TGRS.2019.2931620.
[342] C. Marx, Y. Wang, Y. Park, and S. Ermon, “But
Are You Sure? An Uncertainty-Aware Perspective on
Explainable AI,”
[343] D. Slack, A. Hilgard, S. Singh, and H. Lakkaraju,
“Reliable Post hoc Explanations: Modeling Uncer-
tainty in Explainability,” in Advances in Neural In-
formation Processing Systems, vol. 34, Curran Asso-
ciates, Inc., 2021, pp. 9391–9404. [Online]. Available:
https : / / proceedings . neurips . cc / paper / 2021 / hash /
4e246a381baf2ce038b3b0f82c7d6fb4 - Abstract . html
(visited on 02/01/2023).
[344] Z. Huang, Y. Liu, X. Yao, J. Ren, and J. Han,
“Uncertainty Exploration: Towards Explainable SAR
Target Detection,” IEEE Transactions on Geoscience
and Remote Sensing, pp. 1–1, 2023, ISSN: 1558-0644.
DOI: 10.1109/TGRS.2023.3247898.
[345] N. Blomerus, J. Cilliers, W. Nel, E. Blasch, and P.
de Villiers, “Feedback-assisted automatic target and
clutter discrimination using a bayesian convolutional
neural network for improved explainability in SAR
applications,” Remote Sensing, vol. 14, no. 23, 2022,
ISSN: 20724292. DOI: 10.3390/rs14236096.
[346] J. Pearl, Causality. Cambridge university press, 2009.
[347] B. Scholkopf, D. Janzing, J. Peters, E. Sgouritsa, K.
Zhang, and J. M. Mooij, “On causal and anticausal
learning,” in International Conference on Machine
Learning, 2012. [Online]. Available: https : / / api .
semanticscholar.org/CorpusID:17675972.
[348] C. J. Reed et al., “Scale-MAE: A Scale-Aware Masked
Autoencoder for Multiscale Geospatial Representa-
tion Learning,” in Proceedings of the IEEE/CVF In-
ternational Conference on Computer Vision, 2023,
pp. 4088–4099. (visited on 10/24/2023).
[349] G. Carlsson, “Topology and data,” Bulletin of the
American Mathematical Society, vol. 46, no. 2,
pp. 255–308, Jan. 2009, ISSN: 0273-0979. DOI: 10 .
1090 / S0273 - 0979 - 09 - 01249 - X. (visited on
12/20/2023).
[350] G. Carlsson and R. B. Gabrielsson, “Topological Ap-
proaches to Deep Learning,” in Topological Data Anal-
ysis, vol. 15, Cham: Springer International Publishing,
2020, pp. 119–146. DOI: 10.1007/978-3-030-43408-
3 5. (visited on 12/20/2023).
[351] F. Hensel, M. Moor, and B. Rieck, “A Survey of
Topological Machine Learning Methods,” Frontiers in
Artificial Intelligence, vol. 4, 2021, ISSN: 2624-8212.
(visited on 12/20/2023).
[352] L. Duponchel, “When remote sensing meets topolog-
ical data analysis,” Journal of Spectral Imaging, a1,
Feb. 2018, ISSN: 2040-4565. DOI: 10.1255/jsi.2018.a1.
(visited on 06/28/2023).
[353] J. Ramirez, T. Armitage, T. Bihl, and R. Kramer,
“Topological learning for semi-supervised anomaly de-
tection in hyperspectral imagery,” ser. Proceedings of
the IEEE National Aerospace Electronics Conference,
NAECON, vol. 2019-July, 2019, pp. 560–564. DOI:
10.1109/NAECON46414.2019.9058127.
[354] A. A. Ismail, M. Gunady, H. C. Bravo, and S. Feizi,
“Benchmarking deep learning interpretability in time
series predictions,” Advances in Neural Information
Processing Systems, vol. 2020-Decem, no. NeurIPS,
pp. 1–12, 2020. arXiv: 2010.13924.
[355] A. Theissler, F. Spinnato, U. Schlegel, and R. Guidotti,
“Explainable AI for Time Series Classification: A
Review, Taxonomy and Research Directions,” IEEE
Access, vol. 10, pp. 100 700–100 724, 2022, ISSN:
2169-3536. DOI: 10.1109/ACCESS.2022.3207765.
[356] R. Achtibat et al., “From” where” to” what”:
Towards human-understandable explanations through
concept relevance propagation,” arXiv preprint
arXiv:2206.03208, 2022.
[357] A. Holzinger, R. Goebel, R. Fong, T. Moon, K.-R.
Müller, and W. Samek, “Xxai-beyond explainable
artificial intelligence,” in International Workshop on
Extending Explainable AI Beyond Deep Models and
Classifiers, Springer, 2020, pp. 3–10.[358] C.-K. Yeh, B. Kim, S. Arik, C.-L. Li, T. Pfister,
and P. Ravikumar, “On completeness-aware concept-
based explanations in deep neural networks,” Advances
https://doi.org/10.1029/2021MS002464
https://doi.org/10.1109/AIPR47015.2019.9174596
https://doi.org/10.1109/AIPR47015.2019.9174596
https://doi.org/10.1109/WACV48630.2021.00025
https://doi.org/10.1117/12.2261702
https://doi.org/10.1016/j.isprsjprs.2022.04.029
https://doi.org/10.1016/j.isprsjprs.2022.04.029
https://doi.org/10.1109/TGRS.2019.2931620
https://proceedings.neurips.cc/paper/2021/hash/4e246a381baf2ce038b3b0f82c7d6fb4-Abstract.html
https://proceedings.neurips.cc/paper/2021/hash/4e246a381baf2ce038b3b0f82c7d6fb4-Abstract.html
https://doi.org/10.1109/TGRS.2023.3247898
https://doi.org/10.3390/rs14236096
https://api.semanticscholar.org/CorpusID:17675972
https://api.semanticscholar.org/CorpusID:17675972
https://doi.org/10.1090/S0273-0979-09-01249-X
https://doi.org/10.1090/S0273-0979-09-01249-X
https://doi.org/10.1007/978-3-030-43408-3_5
https://doi.org/10.1007/978-3-030-43408-3_5
https://doi.org/10.1255/jsi.2018.a1
https://doi.org/10.1109/NAECON46414.2019.9058127
https://arxiv.org/abs/2010.13924
https://doi.org/10.1109/ACCESS.2022.3207765
in neural information processing systems, vol. 33,
pp. 20 554–20 565, 2020.
[359] Y. Ji, Y. Wang, and J. Kato, “Spatial-temporal concept
based explanation of 3d convnets,” in Proceedings of
the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2023, pp. 15 444–15 453.
[360] V. V. Ramaswamy, S. S. Kim, R. Fong, and O.
Russakovsky, “Overlooked factors in concept-based
explanations: Dataset choice, concept learnability, and
human capability,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion, 2023, pp. 10 932–10 941.
[361] Z. Gao et al., Earthformer: Exploring Space-Time
Transformers for Earth System Forecasting, Feb. 2023.
DOI: 10.48550/arXiv.2207.05833. arXiv: arXiv:2207.
05833. (visited on 03/24/2023).
[362] M. Tarasiou, E. Chavez, and S. Zafeiriou, ViTs for
SITS: Vision Transformers for Satellite Image Time
Series, Jan. 2023. DOI: 10.48550/arXiv.2301.04944.
arXiv: 2301.04944 [cs]. (visited on 01/16/2023).
[363] A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lučić,
and C. Schmid, “ViViT: A Video Vision Transformer,”
in Proceedings of the IEEE/CVF International Con-
ference on Computer Vision, 2021, pp. 6836–6846.
(visited on 05/19/2023).
[364] H. Fan et al., “Multiscale Vision Transformers,” in
Proceedings of the IEEE/CVF International Confer-
ence on Computer Vision, 2021, pp. 6824–6835. (vis-
ited on 05/19/2023).
[365] Z. Liu et al., “Video Swin Transformer,” in Pro-
ceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2022, pp. 3202–3211.
(visited on 05/19/2023).
[366] D. Neimark, O. Bar, M. Zohar, and D. Asselmann,
“Video Transformer Network,” in Proceedings of the
IEEE/CVF International Conference on Computer Vi-
sion, 2021, pp. 3163–3172. (visited on 05/19/2023).
[367] P. Veličković, “Everything is connected: Graph neural
networks,” Current Opinion in Structural Biology,
vol. 79, p. 102 538, 2023.
[368] J. Zhou et al., “Graph neural networks: A review of
methods and applications,” AI open, vol. 1, pp. 57–81,
2020.
[369] T. N. Kipf and M. Welling, “Semi-supervised clas-
sification with graph convolutional networks,” arXiv
preprint arXiv:1609.02907, 2016.
[370] D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza, and
J. Chanussot, “Graph convolutional networks for hy-
perspectral image classification,” IEEE Transactions
on Geoscience and Remote Sensing, vol. 59, no. 7,
pp. 5966–5978, 2020.
[371] S. Wan, S. Pan, P. Zhong, X. Chang, J. Yang, and
C. Gong, “Dual interactive graph convolutional net-
works for hyperspectral image classification,” English,
IEEE Transactions on Geoscience and Remote Sens-
ing, vol. 60, May 2021, Publisher Copyright: IEEE
Copyright: Copyright 2021 Elsevier B.V., All rights
reserved., ISSN: 0196-2892. DOI: 10.1109/TGRS.2021.
3075223.
[372] L. Yu, J. Peng, N. Chen, W. Sun, and Q. Du, “Two-
branch deeper graph convolutional network for hyper-
spectral image classification,” IEEE Transactions on
Geoscience and Remote Sensing, vol. 61, pp. 1–14,
2023.
[373] L. Mou, X. Lu, X. Li, and X. X. Zhu, “Nonlocal
graph convolutional networks for hyperspectral image
classification,” IEEE Transactions on Geoscience and
Remote Sensing, vol. 58, no. 12, pp. 8246–8257, 2020.
[374] W. Yu, S. Wan, G. Li, J. Yang, and C. Gong, “Hy-
perspectral image classification with contrastive graph
convolutional network,” IEEE Transactions on Geo-
science and Remote Sensing, vol. 61, pp. 1–15, 2023.
[375] N. Liu, Q. Feng, and X. Hu, “Interpretability in
graph neural networks,” in Graph Neural Networks:
Foundations, Frontiers, and Applications, L. Wu, P.
Cui, J. Pei, and L. Zhao, Eds., Singapore: Springer
Singapore, 2022, pp. 121–147.
[376] H. Yuan, H. Yu, S. Gui, and S. Ji, “Explainability in
graph neural networks: A taxonomic survey,” IEEE
transactions on pattern analysis and machine intel-
ligence, vol. 45, no. 5, pp. 5782–5799, 2022.
[377] Y. Wang, C. M. Albrecht, N. A. A. Braham, L. Mou,
and X. X. Zhu, “Self-supervised learning in remote
sensing: A review,” arXiv preprint arXiv:2206.13188,
2022.
[378] F. Pahde, M. Dreyer, W. Samek, and S. Lapuschkin,
“Reveal to Revise: An Explainable AI Life Cycle
for Iterative Bias Correction of Deep Models,” in
Medical Image Computing and Computer Assisted
Intervention – MICCAI 2023, ser. Lecture Notes in
Computer Science, Cham: Springer Nature Switzer-
land, 2023, pp. 596–606, ISBN: 978-3-031-43895-0.
DOI: 10.1007/978-3-031-43895-0 56.
[379] S. Mohseni, N. Zarei, and E. D. Ragan, “A Multi-
disciplinary Survey and Framework for Design and
Evaluation of Explainable AI Systems,” ACM Transac-
tions on Interactive Intelligent Systems, vol. 11, no. 3-
4, 24:1–24:45, Sep. 2021, ISSN: 2160-6455. DOI: 10.
1145/3387166. (visited on 10/24/2023).
ACKNOWLEDGMENTS
The work of A. Höhl was funded by the project ML4Earth
by the German Federal Ministry for Economic Affairs and
Climate Action under grant number 50EE2201C. The work of
I. Obadic is funded by the Munich Center for Machine Learn-
ing. M.Á. Fernández-Torres acknowledges the support from
the European Research Council (ERC) under the ERC Synergy
Grant USMILE (grant agreement 855187) and the European
Union’s Horizon 2020 research and innovation program within
the project ‘XAIDA: Extreme Events - Artificial Intelligence
for Detection and Attribution,’ under grant agreement No
101003469. The work of M.Á. Fernández-Torres and X. Zhu
https://doi.org/10.48550/arXiv.2207.05833
https://arxiv.org/abs/arXiv:2207.05833
https://arxiv.org/abs/arXiv:2207.05833
https://doi.org/10.48550/arXiv.2301.04944
https://arxiv.org/abs/2301.04944
https://doi.org/10.1109/TGRS.2021.3075223
https://doi.org/10.1109/TGRS.2021.3075223
https://doi.org/10.1007/978-3-031-43895-0_56
https://doi.org/10.1145/3387166
https://doi.org/10.1145/3387166
is supported by the German Federal Ministry of Education
and Research (BMBF) in the framework of the international
future AI lab ”AI4EO – Artificial Intelligence for Earth
Observation: Reasoning, Uncertainties, Ethics and Beyond”
(grant number: 01DD20001). H. Najjar acknowledges support
through a scholarship from the University of Kaiserslautern-
Landau. The authors are responsible for the content of this
publication.
COMPETING INTERESTS
The authors declare that they have no known competing
financial interests or personal relationships that could have
appeared to influence the work reported in this paper.
	Introduction
	Related Work
	Research method
	Research Objective
	Search procedure
	Explanation methods in ML
	xAI methods categorization
	Feature attribution
	Model distillation
	Intrinsic
	Contrastive Examples
	Common Explanation methods in RS
	Class Activation Mapping (CAM) and Grad-CAM
	Occlusion senstivity
	lime
	shap
	Attention mechanism
	Evaluation of xAI Methods
	Functional evaluation metrics
	User studiesxAI Objectives
	XAI in RS
	RQ1: Usage/Applications of xAI in Remote Sensing
	Application of existing xAI methods to RS
	Adapted xAI approaches
	RQ2: Interpretation and evaluation of xAI explanations
	Understanding and validating explanations
	Evaluation of xAI methods for RS
	RQ3: xAI Objectives and Findings in RS
	Discussion
	RQ4: Recommended practices in xAI for EO
	The usage of xAI
	Evaluation of xAI
	xAI benchmark datasets
	RQ5: Challenges, limitations, and future directions
	Combination of xAI with related fields
	Challenges from RS properties
	Towards interpretable Deep Neural Networks
	Restricted availability of labeled data
	Lack of standardized and objective evaluation of xAI
	Conclusion
	Appendix
	Search queries
	IEEE
	Scopus
	Springer
	Search query, ML in EO
	Models, xAI categories and methods
	Glossary[35]. While feature selection can
constitute the first step in a ML pipeline, feature interpretation
is usually the last step and typically involves more advanced
techniques than only looking at the predictive performance.
Therefore, feature selection and xAI can be complementary.
On the one hand, feature selection reduces the input space that
needs to be interpreted. On the other hand, xAI can provide
more qualitative insights for selection, such as uncovering a
bias introduced by a feature.
1) Feature attribution: Feature attribution methods rely on
the trained ML model to estimate the importance of the input
features. Depending on whether the explanations are generated
by inspecting the model internals or by analyzing the changes
in the model’s output after modifying the input features, the
methods in this category are further split into backpropagation
and perturbation methods. The output of these methods is
commonly a saliency plot which determines the contribution
of the input features to the model prediction. In the case of
imagery inputs, the output of these methods can be visualized
as a heatmap, also called a saliency map, which highlights
regions relevant to the model prediction.
a) Backpropagation: These methods leverage the inher-
ent structure of DNNs and estimate the relevant features by
propagating the output values at the top or intermediate layers
of the network to the input features. In the majority, these
methods compute gradients to this end. The Deconvolution
method, also known as Deconvolutional Neural Network (NN)
[39], is designed to reverse the convolutional operations from
Convolutional Neural Networks (CNNs). Reconstructing the
input space from the feature maps of the CNN allows vi-
sualizing which information was learned and how the in-
put is transformed across different network layers. Similarly,
Layer-wise Relevance Propagation (LRP) [42] calculates rele-
vance scores for individual input features through layerwise
backpropagating the neuron’s activations from the output,
utilizing specialized propagation rules. The scores indicate
the significance of the connection between input and output.
Various adaptations have been proposed which apply different
propagation rules based on the design of the networks [71–73].
In contrast, the Gradient or Saliency method [37] uses the
partial derivative with respect to the input to create the
attribution maps. Rather than computing the gradient once,
Integrated Gradients [38] calculates the integral of gradients
with respect to the input features along an interpolation or
path defined between a baseline input and the instance to
be explained. CAM [40] visually explains CNNs through
attribution heatmaps by introducing a global pooling layer
right before the top fully connected one. Using the weights of
the latter layer for a particular class, the heatmap is generated
by computing the weighted average of the activation maps
in the last convolutional layer before being upsampled to
match the size of the input tensor for explainability purposes.
One extension of CAM is Gradient-weighted Class Activation
Mapping (Grad-CAM) [41], which replaces these weights by
the gradient of the output with respect to the last convolutional
xAI
Feature attribution
Backpropagation
Activation Maximization [36]
Gradient [37]
Integrated Gradients [38]
Deconvolution [39]
Class Activation Mapping (CAM) [40]
and variants (e.g. Grad-CAM [41])
Layer-wise Relevance Propagation (LRP) [42]
Perturbation
Occlusion Sensitivity [43]
Partial Dependence Plot (PDP) [44]
Accumulated Local Effects (ALE) [45]
Distillation
Local Approximation
Local Interpretable Model-agnostic Explanation
(LIME) [46]
SHapley Additive exPlanations (SHAP) [47]
Model Translation Rule [48, 49], Tree [50–52], Graph [53, 54]
Intrinsic
Interpretable by Design
Decision Rule and Decision Tree [55]
Linear Regression [55],
Generalized Linear Model (GLM) [56],
Generalized Additive Model (GAM) [57]
Latent Dirichlet Allocation (LDA) [58]
Embedding Space
Attention Mechanism [59, 60]
Activation Assessment
Concept Discovery (e.g. TCAV [61], ACE [62])
Joint Training
Explanation Association [63, 64]
Prototype Learning [65–67]
Model Association (e.g. text explanations [68])
Contrastive Examples
Counterfactuals Wachter et al. [69]
Example-based Mikolov et al. [70]
post-hoc ante-hoc local global model-agnostic model-specific
Fig. 3. Categorization of xAI methods based on Ras et al. [10]
layer, thus removing the original requirement of a final global
pooling layer.
b) Perturbation: The perturbation methods assess feature
importance by measuring the sensitivity of model predictions
to changes in the input features. These methods are distin-
guished by how the features are perturbed. Among others,
perturbations include blurring, averaging, shuffling, or adding
noise. For example, the Occlusion method [43] tries to remove
a feature by occluding the features with a neutral value. Per-
mutation Feature Importance (PFI) [74] permutes the features
along their dimension, destroying the original relationship
between input and output values. The Partial Dependence Plot
(PDP) [44] method is designed to show the average influence
of a single input feature on the decision while marginalizing
the remaining features, which are fixed. Therefore, it assumes
feature independence. A similar approach called Accumulated
Local Effects (ALE) [45] can handle correlated features by
averaging over the conditional distribution.
2) Model distillation: Model distillation methods approxi-
mate the predictive behavior of a complex model by training
a simpler surrogate model that is usually interpretable-by-
design. By replicating the predictions of the complex model,
the surrogate model offers hypotheses about the relevant
features and the correlations learned by the complex model
without providing further insights into its internal decision
mechanism. Distillation approaches are categorized into a)
local approximation methods, which train the surrogate model
in a small neighborhood around an individual local example,
and b) model translation methods, which replicate the behavior
of the complex model over the entire dataset.
a) Local approximation: Approaches in this category fo-
cus on explaining individual predictions of the complex model
by inspecting a small neighborhood around the instances to be
explained. In contrast to the backpropagation and perturbation
methods, which operate on the raw input features, the local
approximation approaches transform the input features into
a simplified representation space, such as superpixels for
imagery inputs. A prominent approach in this category is
Local Interpretable Model-agnostic Explanation (LIME) [46].
It creates a new dataset in the neighborhood of the target
instance by perturbing its simplified representation. Next,
an interpretable surrogate model is trained to approximate
the predictions of the complex model on this newly created
dataset. Hence, the explanation for the complex model is
distilled to the interpretation of the surrogate model. A similar
strategy is employed by the SHAP framework [47]. Concretely,
Lundberg and Lee [47] introduce Kernel SHAP, which utilizes
the LIME’s framework under specific constraints to obtain the
feature importance by approximating their Shapley values, a
method grounded in game theory for estimating the player’s
contribution in cooperative games [75]. While Kernel SHAP
is a model-agnostic approach, Shapley values can also be
approximated with model-specific approaches such as Deep
SHAP [47] for neural networks and Tree SHAP [76] which
enables fast approximation of these values for tree-based
models.
b) Model translation: All the model’s decisions on the
entire dataset are approximated into a simple surrogate model.
Typically, the interpretable-by-design methods summarized in
the next section are used as surrogate models, such as rule-
based [77, 78], tree-based [50, 79] or graph-based [53, 80].
3) Intrinsic:ML models, which provide an explanation by
themselves based on their structure, components, parameters,
or outputs. Alternatively, a human-interpretable explanation
can be obtained by visualizing them.
a) Interpretable-by-Design: These methods are inter-
pretable by humans because of their simplicity in design,
architecture, and decision process. Decision Rules are hier-
archical IF-THEN statements, assessing conditions and deter-
mining a decision. Fuzzy rules [81] are designed to address
uncertainty and imprecision, which are frequently encountered
in nature and classical precise rules struggle to represent,
e.g., by partial membership of classes (fuzzy sets) [82]. Since
the proximity of these rules to natural language, they are
interpretable by humans. The Decision Tree [83] greedily
learns decision rules. Their internal structure is a binary tree,
where each internal node is a condition and each leaf is a
decision.
Generalized Additive Models (GAMs) [57] are statistical
modeling techniques that approximate data using an additive
function: g(E(y)) = f1(X1)+f2(X2)+ ...+fi(Xi)+β0+ ϵ,
where fi are smooth, non-linear functions transforming the
features Xi, β0 the intercept coefficient, and ϵ represents the
error term. On the other hand, Generalized Linear Models
(GLMs) [56] consider a linear relationship defined by means of
a specific distribution. Each fi becomes a coefficient βi and the
function g now computes a weighted sum of features, which
allows representing the mean of various exponential-family
distributions. A simple example of this model type is Linear
Regression (LR), which assumes a Gaussian data distribution
where the function g is the identity. In this context, the model
explanation can be obtained by examining the coefficients.
A well-known approach for generative probabilistic topic
modeling is Latent Dirichlet Allocation (LDA) [58]. A dataset
is assumed to be organized in corpora or collections, and
each collection contains discrete units, such as documents
comprised of words. A distribution of these units character-
izes both the collections and the topics. By analyzing the
proportions of the units in the collections, LDA estimates
the underlying topics. The decisions of the Latent Dirichlet
Allocation (LDA) model and the identified topics can be
interpreted by examining the predicted proportions for each
collection.
b) Embedding space: These approaches process the ac-
tivations in the latent space of DNNs to interpret its workings.
The Attention Mechanism [59, 60] creates high-level feature
representations by using the attention weights to model the
dependencies between the different elements in the input.
Hence, visualizing the attention weights is a common pro-
cedure to assess the relevant features for the model decisions.
Activation Assessment analyzes the activations in the latent
space of a NN based on projection techniques. Commonly
used are dimensionality reduction approaches [84] (e.g., t-
distributed Stochastic Neighbor Embedding (t-SNE) [85] and
Uniform Manifold Approximation and Projection (UMAP)
[86]) or neuron receptive fields in CNNs [87]. Concept-based
Explanations summarize the activations in the latent space
in terms of interpretable, high-level concepts. The concepts
represent visual patterns in computer vision, and they are
either user-defined in Testing with Concept Activation Vectors
(TCAV) [61] or learned in an unsupervised manner [62]. These
approaches provide global explanations by quantifying the
concept relevance per class.
c) Joint training: This category provides ante-hoc ex-
planations by introducing an additional learning task to the
model. This task is jointly optimized with the original learning
objective and is used as an explanation. The methods in this
family typically differ based on the explanation representation
and how the additional task is integrated with the original
model.
Explanation association methods impose the inference of a
black-box model to rely upon human-interpretable concepts. A
prominent approach in this category is the concept bottleneck
models that represent the concepts as neurons in the latent
space of the model. They introduce an additional task, first
predicting the interpretable concepts in an intermediate layer
of a DL model. Then, the model predictions are derived based
on these concepts. During training, a regularization term is
added to the loss function that enforces alignment of the latent
space according to the interpretable concepts [63]. To estimate
the concept importance directly, Marcos et al. predict the final
output by linearly combining the concept activation maps [64].
Prototype learning approaches aim to identify a set of
representative examples (prototypes) from the dataset and
enable an interpretable decision mechanism by decomposing
the model predictions based on the instance’s similarity with
the learned prototypes [65]. Thus, visualizing the prototypes
enables global model interpretability, while the similarity with
an input instance offers local model explanations. One popular
approach for prototype learning on image classification tasks
is the ProtoPNet architecture introduced by Chen et al. [66].
The prototypes represent image parts and are encoded as
convolutional filters in a prototype layer of the proposed
network. Their weights are optimized with the supervised
learning loss of the network and additional constraints that
ensure both the clustering of the prototypes according to
their class and the separability from the other classes. One
extension of this approach is the Neural Prototype Trees [67],
which organizes the prototypes as nodes in a binary decision
tree. Each node computes the similarity of the corresponding
prototype with an instance. These similarities are used to route
the instance towards the leaves of the tree containing the class
predictions.
In contrast to the previous approaches, which encode the
explanation within the model that performs inference for
the original learning task, the model association methods
introduce an external model that generates explanations. These
approaches are often utilized to provide textual explanations
for CV tasks. An example of such an approach is presented
by Kim et al., who derive text explanations for self-driving
cars based on jointly training a vehicle controller and a
textual explanation generator [68]. The vehicle controller is a
CNN model that recognizes the car’s movements with spatial
attention maps. Next, the explanation generator, which is a
Long Short-Term Memory neural network (LSTM) model,
processes the context vectors and the spatial attention maps
from the controller to produce the text explanation.
4) Contrastive Examples: Methods within this category
provide alternative examples to an input instance and allow
obtaining an explanation by comparing them. Usually, exam-
ples that are close to each other in the input space yet lead to
a different outcome than the original input instance are shown.
a) Counterfactuals: This explanation type aims to dis-
cover the smallest change required for an instance to achieve
a predefined prediction. Essentially, they answer the question,
”Why does it yield output X rather than output Y?” which is
very close to human reasoning [88]. They have a close prox-
imity to adversarial examples, although their objectives differ
significantly. Adversarial examples usually want to achieve a
confident prediction with a minimal perturbed instance, whose
change should remain imperceptible for humans. Conversely,
counterfactuals aim to provide a diverse set of examples and
should allow representing the decision boundary of the model.
Wachter et al. [69] introduce an optimization problem whose
primary objective is to find a counterfactual, denoted as x′,
which is as close as possible to the original input x. As such,
the distance function d(·, ·) between the counterfactual and
the original input should be minimized. The corresponding
loss function is as follows:
L(λ, xi, x
′, y′) = λ(fw(x′)− y′)2 + d(xi, x
′), (1)
where y′ represents the desired output, fw is the model with
fixed weights, and λ serves as a regularization parameter to
balance the contribution of the proximity of the original output
to the desired output y′ with respect to the similarity between
counterfactual and input.
b) Example-based explanations: Unlike counterfactuals,
which can generate artificial instances, example-based expla-
nations usually present existing ”historical” training instances
and showcase similar instances to the input under consid-
eration [84]. The user can connect, correlate, and reason
based on the analogies. The explanatory approach aligns with
case-based interpretable-by-design model explanations, e.g., k-
Nearest-Neighbors (kNNs). For example, Mikolov et al. [70]
train a skip-gram model and evaluate the model using the
nearest neighbors determined by the distances in the embedded
space. Furthermore, they illustrate that the acquired word
representations have a linear relationship, allowing for the
computation of analogies through vector addition.
B. Common Explanation methods in RS
In this section, we provide an in-depth explanation of several
popular xAI methods in the field of RS.
1) Class Activation Mapping (CAM) and Grad-CAM: Zhou
et al. introduce CAM [40] to obtain visual heatmaps for
class discriminative localization in CNNs. This technique aims
to identify important regions within an image that influence
the model’s decision toward each of the C classes. The
approach requires the presence of a global average pooling
layer between the last convolutional layer and the top fully
connected layer of the CNN architecture. To calculate the
class activation maps, the weights wc,k at the fully connected
layer associated with a particular class c are used to estimate
the importance of each feature or activation map Ak at the
input of the global average pooling layer. Finally, the saliency
map highlighting the discriminative regions for the class c is
computed as the weighted sum of these K activation maps
with the following equation:
SCAM
c =
K∑
k=1
wc,kAk (2)
Selvaraju et al. generalize CAM to a broader range of CNN
architectures by introducing Grad-CAM [41]. In contrast to
CAM, which makes use of the weights corresponding to class
c, Grad-CAM utilizes the average gradient of the logit yc
for each of the K feature maps in the last convolutional
layer to assess their importance for class c. Additionally, a
ReLU function is applied, propagating only the positive values.
Therefore Eq. 2 becomes:
SGrad−CAM
c = ReLU
( K∑
k=1
( 1
N
∑
i
∑
j
∂yc
∂Ai,j
)
Ak
)
, (3)
where (i, j) stand for the spatial coordinates of each of
the N locations in every activation map Ak. Because the
convolutional layers of the networks usually reduce the input
size, the activation maps need to be upsampled to the original
input size. However, this upsampling process can lead to im-
precisions and other backpropagation methods produce more
fine-grained heatmaps, which directly attribute to each pixel,
such as Layer-wise Relevance Propagation (LRP) or Integrated
Gradients (IG). Nevertheless, recent method revisions, such
as LayerCAM [89], have aimed to produce more detailed
heatmaps.
2) Occlusion senstivity: Zeiler and Fergus propose a
perturbation-based method [43] to visualize the importance of
different image regions. Their approach slides a patch over the
image and observes the sensitivity of the model’s prediction.
Different values for the patch, its size, and sampling techniques
can be considered. The assigned importance is directly propor-
tional to the drop in performance of the model after occluding
the patch. Consequently, this method is model-agnostic and
can be applied to any kind of architecture.
3) Local Interpretable Model-agnostic Explanation
(LIME): Ribeiro et al. proposed LIME [46], a model-
agnostic method that approximates the behavior of a complex
model locally, in the neighborhood of a target instance.
Concretely, to explain the prediction of a complex model f
for a target instance x, LIME performs the following main
steps: 1) a dataset is created around the neighborhood of x by
randomly performing perturbations on it (e.g., adding noise,
hiding or blurring parts of the input, etc.), 2) an interpretable
by-design surrogate model g is trained on this dataset and
3) the internals of g are inspected to provide an explanation.
It is important to note that LIME generates explanations
on a simplified representation space that is interpretable
to humans. For example, when the input is an image, the
simplified representation can correspond to a binary vector,
indicating the presence of superpixels decomposing the
image. Therefore, in the first step, LIME creates a dataset by
perturbing the simplified representation of x. This dataset is
labeled according to the predictions of the complex model
f on the perturbed instances (the perturbed instances are
reverse-transformed into the original input representation
before feeding it into the complex model). In the second
step, the surrogate model g is trained on this dataset by
weighting the perturbed samples based on their similarity
with x. Finally, the internals of g are inspected to explain
the prediction of f for x. For instance, in case g is a linear
regression model, its coefficients can be used to assess
the feature importance. In contrast, if g is a decision tree,
inspecting its rules can serve to explain the predictions of the
complex model f .
4) SHapley Additive exPlanations (SHAP): The Shapley
values [75] constitute an approach from cooperative game
theory used to estimate the importance of the input features
for the prediction of a ML model f on a local instance, as
well as their average marginal contribution across all possible
coalitions of features. Concretely, given an instance x, the
Shapley value ϕi for feature i is computed as follows:
ϕi(f, x) =
∑
S⊆F\{i}
|S|! (F − |S| − 1)!
F !
[fx (S ∪ {i})− fx (S)] ,
(4)
where F is the set of all input features, S is a coalition of
features, and fx (S) is the marginalized prediction over the
features not included in S while the features in the coalition
S take the values of the instance x. The Shapley values
are considered to fairly assign the contribution of the input
features to the model prediction as they satisfy the following
properties: 1) the sum of the feature contributions adds up to
the difference between the model prediction for the instance x
and the average model prediction on the dataset (efficiency),
2) the contributions of two features are the same if they
equally contribute to all coalitions (symmetry), 3) a zero
contribution is assigned to the features that do not change
the model prediction (dummy), and 4) the Shapley value of a
feature for an ensemble of ML models can be computed by
aggregating the individual Shapley values across the models
in the ensemble (additivity) [19].
The analytical solution for the Shapley values can be a
computationally expensive operation for models trained on
more than a few features, as Eq. 4 requires iterating over
all possible feature coalitions and computing the marginal
contribution in each coalition. Therefore, in practice, the
Shapley values are estimated with approximation techniques.
Lundberg and Lee introduce the SHAP framework [47] as a
unified approach for model interpretability based on the family
of additive feature attribution methods. This family represents
the explanation through the coefficients of a linear model and
has a similar set of desired explanation properties to the ones
of the Shapley values. Although other xAI methods like LIME,
LRP, and Deep Learning Important FeaTures (DeepLIFT) [90]
can be represented in the form of additive feature attribution
methods, the only additive method that satisfies the desired
explanation properties is the one having the Shapley values
as its linear coefficients (referred as SHAP values). This
formulation enables the approximation of the Shapley values
with a model-agnosticapproach based on LIME. Concretely,
the authors introduce Kernel SHAP, an approach that estimates
the Shapley values by constraining the LIME method to rely
on a linear model as a surrogate model and to use a specific
similarity function for the weighting of the perturbed instances.
To leverage the internal structure of the ML models for a fast
approximation of the Shapley values, the authors also propose
model-specific approaches, namely Deep SHAP for DNNs and
Tree SHAP [76] for tree-based models.
5) Attention mechanism: Attention mechanisms represent
an integral component of neural networks that mimics cogni-
tive attention [91]. Although initially proposed for natural lan-
guage processing tasks like machine translation and sentiment
analysis [59, 60], they have also been leveraged recently in
CV applications [92], as well as for graph-structured data [93].
These modules induce attention weights on the input features,
determining how a neural network combines them to produce
a high-level feature representation. Thus, visualizing the atten-
tion weights is a well-known approach to understanding the
relevant features for the model predictions and assessing the
interaction of the input features in the context of the learning
task [94, 95].
An attention mechanism is usually defined for a query q
and matrices of key and value pairs K = [k1, ...,kL] and
V = [v1, ...,vL], respectively. It outputs a high-level represen-
tation c depending on the values and based on the alignment
of the keys K with the query q. The alignment function
is specified according to the attention weights and can be
computed using various functions proposed in the literature
[96]. One of the most widely used mechanisms is the scaled
dot-product attention introduced by Vaswani et al. in [60],
where attention weights α are computed as follows:
α = softmax(
qKT
√
dk
), (5)
being dk the embedding dimension of the keys. Then, the
high-level value representation v is computed as a linear
combination of the attention weights and the value vectors
c = αV.
C. Evaluation of xAI Methods
Evaluating explanation quality and its trustworthiness is an
essential methodological challenge in xAI, which has received
considerable attention in recent years. The existing evaluation
strategies can be categorized into (i) functional approaches
based on quantitative metrics and (ii) user studies [97].
1) Functional evaluation metrics: These metrics assess the
explanation quality by quantitatively describing to which ex-
tent an explanation satisfies a certain set of desired properties.
A commonly evaluated explanation property is faithfulness
(also called correctness) [98], which asserts how close an
explanation method approximates the actual model workings.
Various functional metrics are proposed to evaluate explana-
tion faithfulness. For example, metrics based on randomization
tests are introduced in [99] to evaluate the explanation sensitiv-
ity to randomization in model weights and label permutation.
The results reveal that most of the evaluated backpropagation
methods do not pass these tests. Another common approach to
evaluate explanation faithfulness is based on the perturbation
of the input features. For instance, [100] measures the changes
in the model output after perturbation of the supposedly
important features, as estimated by the explanation method.
Next, metrics based on perturbation are also used to evaluate
other explanation properties like robustness (also referred to
as explanation sensitivity) which inspect the impact of small
perturbations of the input features on the resulting explanation
[101, 102]. Explanations with low sensitivity are preferred as
this indicates that the explanation is robust to minor variations
in the input. However, it is worth noting that the perturbation-
based metrics might result in examples with different distribu-
tion than the instances used for model training, which ques-
tions whether the drop in model performance can be attributed
to the distribution shift or to the perturbation of the important
features. To address this issue, Hooker et al. [103] propose
model retraining on a modified dataset where a fraction of
the most important features identified by the xAI method is
perturbed. Further, an improved evaluation strategy that avoids
the need for model retraining based on information theory is
presented by Rong et al. [104]. Additionally, other evaluation
approaches also include properties like localization, which
determines whether the explanation identifies the ground-truth
region of interest, or complexity, with the aim of measuring the
sparsity of an explanation [101]. For a detailed categorization
of the different explanation quality properties and a summary
of the approaches used in the literature to functionally evaluate
them, the reader is referred to [98].
2) User studies: Conversely, experiments in which humans
evaluate the quality of explanations can be conducted. User
studies in [97] are further categorized into an application-
grounded evaluation and human-grounded evaluation, depend-
ing on the evaluation task, the type of participants, and the
considered explanation quality criteria.
On the one hand, Application-grounded evaluation studies
typically involve domain experts who evaluate the explanation
in the context of the learning task. For instance, the authors
in [105] perform a case study where physicians assess the
predictions of a CNN model that classifies electrocardiogram
heartbeats. This study measures physicians’ agreement with
the CNN model when they are presented with (i) explanations
computed with the LIME method and (ii) nearest-neighbour
examples obtained from the latent space of a trained CNN
model. The results indicate higher agreement for the nearest-
neighbor example explanations, which simultaneously enable
the physicians to relate model outcomes to clinically relevant
concepts in contrast to the LIME feature importance visual-
ization.
On the other hand, human-grounded evaluation consists of
user studies where the participants are typically non-domain
experts that evaluate more general notions of explanation
quality. For example, Alqaraawi et al. [106] evaluate whether
explanations computed with the LRP method help lay users
recruited via an online crowdsourcing platform to understand
the decisions of CNN model for image classification. The
study’s conclusions identify a significant statistical result,
which shows that users predict the model decisions with higher
accuracy when presented with LRP explanations. A detailed
survey on the user studies for xAI methods evaluation is
presented in [107].
D. xAI Objectives
In this study, we group the objectives of utilizing xAI in RS ac-
cording to the following four reasons for using xAI, as defined
in [30]: (1) explain to justify, (2) explain to control, (3) explain
to discover, and (4) explain to improve. Explain to justify is
motivated by the need to explain individual outcomes, which
ensures that the ML systems comply with legislations, such
as enabling users the ”right to explanations”. Furthermore,
the explanations can enable a detailed understanding of the
workings of the ML model. Hence, the explain to control
objective is relevant for assessing model trustworthiness and
can help to identify potential errors, biases, and flaws of the
ML model. These insights can be used to discover scientific
knowledge and new insights about the underlying process
that is modeled with the ML system or to further improve
the existing model. Weber et al. classify the improvement
techniques based on the xAI insights into (1) augmenting
the input data, (2) augmenting the intermediate features, (3)
augmenting the loss function, (4) augmenting the gradient,
and (5) augmenting the ML model [108]. For the sake of
completeness, we consider adapting existing xAI methods as
another improvement strategy.
V. XAI IN RS
Here we summarize the research on xAI in RS and an-
swer the research questions RQ1, RQ2, and RQ3. First,we
bring attention to the common practices and highlight new
approaches (RQ1). Next, we summarize the methodologies
to understand and evaluate the model explanations (RQ2).
Further, we outline the common research questions across the
different RS tasks that practitioners aim to answer with xAI
(RQ3).
A. RQ1: Usage/Applications of xAI in Remote Sensing
Table I contains the full list of publications included in our
study. The papers are arranged by considering groups for the
different EO tasks (see Appendix C) and following the xAI
categorization shown in Figure 3. xAI methods, objectives,
and evaluation types are also listed in the table.
1) Application of existing xAI methods to RS: Figure 4
shows the number of papers using a xAI method grouped
by the categories introduced, while Figure 11 (see Appendix
B) illustrates all the combinations of model and methods
in the literature. Local approximation methods are the most
frequently used in over 94 publications. Through their popu-
larity, they are used on the most diverse set of EO tasks and
models. In most cases, they interpret tree-based models (i.e.,
Random Forest (RF) and tree ensembles), followed by CNNs
and Multilayer Perceptrons (MLPs). Also, with over 65%,
most of these publications rely solely on local approximation
methods without evaluating other methods. Backpropagation
methods follow with a small gap of 72 papers, leveraging
mostly CAM variants. While a lot of CNN architectures are
interpreted, most time series models, like the LSTM, are also
located here. In contrast, most papers using transformers are
among the 35 publications that leverage embedding space
interpretation techniques, which was to be expected since
attention is already the centerpiece of the architecture. Many
publications in this category also assess the models’ feature
space. Followed by a large gap to the SHAP and LIME meth-
ods, 56 publications use various perturbation methods. The
number of publications is fairly well proportioned between
the methods, but PFI is the most widely used, followed by
occlusion and PDP. 15 publications leverage a diverse set of
interpretable-by-design models. Although Generalized Linear
Models (GLMs), particular LR, are the most common, newer
models like the Explainable Boosting Machines (EBM) are
gaining recognition. Overall, only a few papers employ joint
training, like prototypes or explanation associations. Even less
popular are model translations, counterfactuals, and example-
based explanations.
The Mean Decrease in Impurity (MDI) or Gini importance
is often used in feature selection for global importance mea-
surements and can be easily obtained for tree-based methods
[109–125]. This shows that xAI and feature selection have flu-
ent boundaries, as the Gini index is used for feature selection,
i.e., when deciding about the depth in the tree with the purity
of a split, but also allows for the interpretation of the decision
process of the model. Nevertheless, we will refer to them as
feature selection methods.
a) Approaches in common EO tasks: To provide a
deeper insight, we will now discuss in detail the three most
representative EO tasks according to the number of papers,
CA
M
-v
ar
ia
nt
D
ec
on
vo
lu
tio
n
D
ee
pL
ift
G
ra
di
en
t-v
ar
ia
nt IG
LR
P
O
th
er
A
LE
O
cc
lu
sio
n
PD
P
PFI
O
th
er
LIM
E
SH
A
P
R
ul
e
Ext
ra
ct
io
n
EBM
G
A
M
G
LM G
P
LD
A
R
ul
es
O
th
er
A
ct
iv
at
io
n
A
ss
es
sm
en
t
Att
en
tio
n
Exp
la
na
tio
n
A
ss
oc
ia
tio
n
M
od
el
A
ss
oc
ia
tio
n
Pro
to
ty
pe
G
A
N
W
IK
M
D
I
O
th
er
0
10
20
30
40
50
60
70
80
N
u
m
b
er
of
P
u
b
li
ca
ti
on
s
xAI Category
Backpropagation
Perturbation
Local Approximation
Model Translation
Interpretable by Design
Embedding Space
Joint Training
Counterfactuals
Example-based
Feature Selection
Fig. 4. Number of publications per different xAI methods, grouped according to our categorization in Figure 3.
which are landcover mapping, agriculture monitoring, and
natural hazard monitoring. The graphs in Figure 5 show these
tasks and the frequency of the combination in the usage of
ML models and xAI methods. Figure 11 in Appendix B
provides a detailed overview of the usage patterns for all the
EO tasks considered in this study.
Landcover monitoring
Due to the well-established datasets, the most prominent EO
task is often employed for evaluating or developing new xAI
methods. Further, a low amount of expert knowledge is needed
to elucidate the model outputs for this task compared to others,
such as atmosphere monitoring or ecosystem interactions. The
majority of the 36 landcover mapping papers present CNN
architectures, leading to the favored use of backpropagation
methods, especially from the CAM family. For example, Sen
and Keles leverage Grad-CAM to identify the reasons for
misclassified images [126], while Vasu and Savakis assess
transfer learning and the interpretation resilience with CAM
[127]. The numerous publications proposing novel or evaluat-
ing methods can be found in the corresponding Sections V-A2
and V-B2, respectively. Albeit with lower frequency than the
CAM approaches, the workings of the CNN models are also
interpreted based on the local approximation approaches. For
example, SHAP is utilized to reveal the salient pixels and the
global importance of the spectral bands in [128], LIME is used
to improve model performance on the misclassified examples
in [129], and both approaches are utilized in [130] to validate
the model predictions.
The second outstanding ML model for landcover monitoring
is RF and exclusively model-agnostic xAI methods are used
for its interpretation. Jayasinghe and Kumar explain the cor-
relations regarding land use for tea cultivation through LIME
[112]. The most frequently used approach, SHAP, is leveraged
in [131], while PFI is explored in [110] and [132] to rank
reflectance data from various satellites and DEM information.
The DEM information seems to be valuable since both [131]
and [132] identify slope as the most important feature, while
elevation is ranked the highest in [110].
Unique to this task, the less common counterfactual and
example-based methods are applied to CNNs. Dantas et al.
utilize a Generative Adversarial Network (GAN) to generate
counterfactual time series for the Normalized Difference
Vegetation Index (NDVI) of different landcover classes. The
generator introduces perturbations to instances and attempts to
change the prediction of the pre-trained classifier. Meanwhile,
a discriminator ensures the quality of the newly generated
instance [133]. Moreover, Ishikawa et al. show examples of
the dataset belonging to the same class by computing the
similarity in the latent space of the last layer. These examples
can be used to identify Out-Of-Distribution (OOD) instances
or cases where the classifier lacks generalization [134].
Agricultural monitoring
SVM (1)
Ensemble (9)
GB (1)
LDA (1)
RF (12)
CapsuleNet (1)
CNN (46) GAN (1)
Gradient-variant (3)
LIME (5)
Other (1)
Backpropagation (25)
ALE (1)
Rules (1)
Occlusion (1)
LDA (2)
Attention (5)
Interpretable by Design (3)
Counterfactuals (1)
Example-based (1)
IG (1)
CAM-variant (19)
Joint Training (1)
Feature Selection (3)
MDI (3)
PDP (1)
Prototype (1)
PFI (2)
WIK (1)
Perturbation (5)
SHAP (13)
Activation Assessment (9)
Local Approximation (18)
DeepLift (1)
Embedding Space (14)
Model xAI Category xAI Method
(a) Landcover mapping.
GB (5)
GAN (4)
FNN (1)
LDA (2)
DT (1)
RF (12)
Transformer (9)
GLM (2)
CNN (4)
kNN (1)
SVM (1)
GP (3)
LSTM (11)
EBM (1)
Embedding Space (9)
Gradient-variant (6)
Other (5)
Model Translation (1)
Backpropagation (12)
ALE (1)
Occlusion (3)
GP (1)
Attention (5)
Interpretable by Design (3)
Rule Extraction (1)
Explanation Association (2)
IG (2)
CAM-variant (2)
GLM (1)
Joint Training (3)
Feature Selection (5) MDI (4)
PDP (1)
Prototype (1)
PFI (2)
EBM (1)
Perturbation (11)
SHAP (13)
Activation Assessment(4)
Local Approximation (13)
LRP (2)
Model xAI Category xAI Method
(b) Agricultural monitoring.
GB (6)
GAM (1)
FNN (6)
Ensemble (12)
Rules (1)
RF (10)
GLM (2)
NB (1)
CNN (10)
kNN (1)
SVM (2)
LSTM (3)
EBM (1)
LIME (3)
Other (1)
Backpropagation (3)
Occlusion (2)
Rules (1)
Attention (1)
Interpretable by Design (3)
IG (1)
CAM-variant (1)
Joint Training (1)
Feature Selection (1)
MDI (1)
PDP (8)
Prototype (1)
PFI (12)
EBM (1)
Perturbation (22)
SHAP (20)
Activation Assessment (2)
Local Approximation (23)
DeepLift (1)
Embedding Space (3)
Model xAI Category xAI Method
(c) Natural hazard monitoring.
Fig. 5. The number of times models, xAI categories, and xAI methods are
used for the most common EO tasks.
The primary tasks in this group are crop yield prediction and
crop type classification. Two additional tasks are tackled: irri-
gation scheme classification [135] and lodging detection [136].
The methodologies employed revolve around model-agnostic
and backpropagation approaches, namely perturbation, local
approximation techniques, and CAM methods. These methods
are frequently integrated with gradient boosting and RF tree-
based models. Gradient boosting models are consitently inter-
preted with SHAP [136–140]. For example, Han et al. combine
gradient boosting with SHAP to identify maize lodging from
Unmanned Aerial Vehicles (UAV) images [136]. The key
features identified are plant height computed from a DEM and
Digital Surface Model (DSM) and textural features from the
Gray-Level Co-Occurrence Matrix.
Many different xAI strategies have been considered for crop
yield prediction. Huber et al. assess the feature importance
of weather and spectral bands across three different crop
phenology times on administrative boundary levels by sum-
marizing the SHAP values [138], while Broms et al. examine
the relevance of soil features and DEM [137]. Both studies
emphasize the importance of specific time steps and that
the spectral bands provide valuable information. Jones et al.
investigate soil features, DEM, and spectral reflectance with
a finer spatial resolution at subfield yields. Interestingly, no
spatially consistent limiting factors are identified for an entire
field, indicating the potential of countermeasures on a subfield
level [139].
Other studies utilize RF and perturbation methods, specif-
ically PFI [141] and ALE or PDP [142]. The feature impor-
tance of soil, weather, and spectral data on yield potential is
analyzed in [141], while PDP and ALE are used to assess the
interactions between management, weather, and soil data in
[142]. Their findings suggest that residue management or rate
application decisions can significantly influence crop yield.
Time series data is frequently employed for crop map-
ping, leading to the utilization of Recurrent Neural Networks
(RNNs) [121, 143–145] and transformers [121, 146–148]
architectures. Given LSTM models for crop yield prediction,
IG and SHAP are used to attribute soil moisture, weather,
and reflectance data in [144] and [145]. Both studies reach
the same conclusion: high temperatures during the growth
season have a negative impact on crop yield. LSTM with
attention and transformers allow to differentiate between corn
and soybeans in [121]. Both models are interpreted not only
through their corresponding attention mechanisms but also
considering gradients and activation projection with t-SNE.
Overall, both models agree in their attribution, emphasizing
the middle of the year as an important period when corn
starts to silk and soybeans begin to bloom. Wolanin et al.
apply CNNs to time series for predicting crop yield from
vegetation indices and weather data. Attributions and scenario
analysis for different weather conditions are provided with a
method derived from CAM and modified for regression, named
Regression Activation Mapping (RAM) [149].
Crop classification with prototypes through a CNN
encoder is proposed by Rosa et al. [150]. The LLP-Co
method uses a priori proportions of the classes to match the
instance proportions assigned to the prototypes. A distinctive
approach based on a variational adversarial network for crop
yield and irrigation scheme classification is introduced by
Arun and Karnieli [135]. To learn more meaningful latent
representations, the discriminator of the GAN architecture
also needs to classify the latent representations from the
encoder into the correct classes. Additionally, LRP is used
to attribute the inputs. Newman and Furbank use a model
translation method to identify RF subtrees, common rules
within the RF model, and their associated error rates [117].
Other works in agriculture monitoring encompass Gaussian
Processs (GPs) [151, 152] and EBM [153]. Notably, the
methods for agricultural monitoring are evaluated solely with
anecdotal evidence. No study employs quantitative metrics,
and only [145] carry out a small user study.
Natural hazard monitoring
The third main task consists of 26 papers on landslide suscepti-
bility, fire and flood monitoring, as well as geological hazard
monitoring. The typical features used as input for landslide
susceptibility assessment are DEM variables, landcover and
vegetation (e.g., NDVI) information, and weather variables
such as temperature or precipitation. Often, they are supple-
mented by human factors like the distance to roads or hydro-
logical properties (e.g., drainage density, soil type). Usually,
these models are interpreted with SHAP: RFs [154, 155],
Support Vector Machines (SVMs) [155], CNNs [156], Feed
Forward Neural Networks (FNNs) [156, 157], and Gradient
Boosting (GB) [158, 159]. Additionally, LIME, PFI or PDP
are applied in the same fashion in [156, 160]. Other approaches
include interpretable models [161–163].
In summary, rainfall [156, 162, 164], slope [158, 161],
elevation, aspect [164], curvature [154], distance to road [158,
160], and NDVI [154, 160] are among the most important
features for this task. Moreover, higher NDVI suggests a
decrease in landslide probability [160]. In contrast, mines can
increase the probability [162].
The flood mapping approaches rely on similar features as
landslide susceptibility (being rainfall a permanent component)
and leverage mostly tree-based models or CNNs with SHAP
[165–167], PDP [167] or prototypes [168]. The elevation and
slope have the biggest influence on floods worldwide and in
Turkey, according to [165, 166], while rainfall, road density,
and building density have more influence on urban floods
following [167]. Zhang et al. detect floods by performing
adaptive k-means clustering of the image pixels in the latent
space of a U-Net encoder [168]. This approach associates the
prototypes with the cluster centers and enables the interpreta-
tion of the model decisions in terms of linguistic IF ... THEN
rules.
Model-agnostic methods (SHAP, PFI, PDP) are mostly
applied to explain fire susceptibility prediction models (RFs
and FNNs, see [169, 170]), which typically rely on landcover,
weather, and DEM data, as the former tasks. [170] shows that a
climate fire index (created other work with DNN), NDVI, and
slope are critical indicators for Italy, while [169] finds that
humidity, wind speed, and rainfall most are most important
factors for Australia. Further, [171] reveals that fires are more
severe in areas with higher elevation and in those where the
dominant vegetation types are shrubs and open forests.
Various works explore geological hazards. For example,
[172] uses CNN to classify various disaster events (e.g., build-
ing damage, fire) from aerial imagery and reveals the salient
regions with a weighted combination of LIME and SHAP
attributions. Further, [173] utilizes Grad-CAM to identify the
mistakes of the CNN model for predicting volcano defor-
mation patterns and uses t-SNE to evaluate the differences
between the latent space representations of real and simulated
data. Regarding other geological hazards, [174] explores the
tunnel geothermal disaster susceptibility based on land surface

Mais conteúdos dessa disciplina