Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review

Anton Kuznietsov, Balint Gyevnar, Cheng Wang, Steven Peters, Stefano V. Albrecht The research by A. Kuznietsov is accomplished within the project “AUTOtech.agil” (FKZ 01IS22088S). We acknowledge the financial support for the project by the Federal Ministry of Education and Research of Germany (BMBF); B. Gyevnar was supported in part by the UKRI Centre for Doctoral Training in Natural Language Processing (grant EP/S022481/1). (A. Kuznietsov and B. Gyevnar contributed equally to this work.)(Corresponding author: Cheng Wang.) B. Gyevnar, and S.V. Albrecht are with the School of Informatics, University of Edinburgh, EH8 9AB Edinburgh, U.K. (e-mail: {balint.gyevnar, s.albrecht}@ed.ac.uk) A. Kuznietsov and S. Peters are with the Institute of Automotive Engineering, Technical University (TU) of Darmstadt, 64287 Darmstadt, Germany (e-mail: anton.kuznietsov@tu-darmstadt.de; steven.peters@tu-darmstadt.de) Cheng Wang is with the School of Engineering and Physical Sciences, Heriot-Watt University, EH14 4AS Edinburgh, U.K. (e-mail: Cheng.Wang@hw.ac.uk).

Abstract

Artificial Intelligence (AI) shows promising applications for the perception and planning tasks in autonomous driving (AD) due to its superior performance compared to conventional methods. However, highly complex AI systems exacerbate the existing challenge of safety assurance of AD. One way to mitigate this challenge is to utilize explainable AI (XAI) techniques. To this end, we present the first comprehensive systematic literature review of explainable methods for safe and trustworthy AD. We begin by analyzing the requirements for AI in the context of AD, focusing on three key aspects: data, model, and agency. We find that XAI is fundamental to meeting these requirements. Based on this, we explain the sources of explanations in AI and describe a taxonomy of XAI. We then identify five key contributions of XAI for safe and trustworthy AI in AD, which are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation. Finally, we propose a conceptual modular framework called SafeX to integrate the reviewed methods, enabling explanation delivery to users while simultaneously ensuring the safety of AI models.

Index Terms:

Autonomous driving, autonomous vehicle, explainable AI, trustworthy AI, AI safety

I Introduction

Artificial intelligence (AI) has gained a lot of attention in various technical fields in the last decades. Particularly, deep learning (DL) based on deep neural networks (DNNs) provides human-comparable or potentially even better performance for some tasks due to its data-driven high-dimensional learning ability [1, 2], so it has naturally emerged as a vital component in the field of autonomous driving (AD).

Nevertheless, deep learning suffers from a lack of transparency. It exhibits black-box behaviour, obscuring insights into its internal workings. This opacity makes it harder to identify issues and to determine which applications of AI are admissible in the real world. However, in safety-relevant domains such as AD, it is crucial to develop safe and trustworthy AI. Although there are several mitigation processes to handle safety concerns in AI, such as well-justified data acquisition [3], the adequacy of these measures in ensuring sufficient safety remains an open question, highlighting the need for further approaches.

Moreover, no standards currently explicitly address the use of data-driven AI in AD. The existing safety standard ISO 26262 - Road Vehicles - Functional safety [4] was not explicitly developed for data-driven AI systems and their unique characteristics [5]. The standard ISO 21448 - Safety of the Intended Functionality (SOTIF) [6] aims at ensuring the absence of unreasonable risk due to hazards from functional insufficiencies of the system and requires quantitative acceptance criteria or validation targets for each hazard. The concept can be applied to AI-based functions, but these acceptance criteria are not explicitly defined [7]. Moreover, specific guidance for designing AI-based functionality is missing.

As a result, these standards face challenges in addressing safety requirements for data-driven deep learning systems [8]. Although there is ongoing work on the ISO/AWI PAS 8800 - Road Vehicles - Safety and Artificial Intelligence [9], its scope and guidance remain unclear due to it still being in a development phase. In general, there is also a relatively high level of mistrust in society regarding AD. The American Automobile Association’s survey on autonomous vehicles (AV) indicates that 68% of drivers in the United States are wary of AVs [10], and AI has been identified as one of the key factors contributing to the non-acceptance of AVs in society [11].

A promising approach to address these problems is explainable AI (XAI). XAI aims to provide human-understandable insights into the behaviour of the AI and the development of XAI methods could be beneficial for different kinds of stakeholders [12]. First, it may become an essential tool for AI developers to identify and debug malfunctions [13]. Second, XAI could help users calibrate their trust in automated systems in line with the actual capabilities of AVs [14], thereby preventing misuse. Lastly, assurance companies and regulatory bodies may also benefit, as the increased transparency due to XAI could enable traceability that allows for a more accurate assessment of due diligence and liability in case of accidents [15]. Muhammad et al. [16] go as far as to say that in the future XAI could be necessary in terms of regulatory compliance including fairness, accountability and transparency in DL for AD. Given the increasing size of literature on XAI specifically for AD, it is necessary to systematically review which XAI techniques exist and how they are applied to enhance the safety and trustworthiness of AD.

I-A Previous Reviews on XAI for AD

Indeed some reviews of XAI for AD already exist and we give a brief overview of each in this subsection. These works provide a good overview of the challenges and stakeholders of the field but have some crucial shortcomings:

1.

Lack of a systematic literature review methodology, leading to potential bias and incomplete coverage;
2.

No focus on the specific benefits and drawbacks of XAI on the safety and trustworthiness of AD;
3.

No review of frameworks for integrating XAI with AD.

The work of Omeiza et al. [17] was the first notable survey in the field. They provide a holistic look at XAI for AD, covering the different needs for explanations, regulations, standards, and stakeholders, and an overview of some explainability methods applied in AD. They review the challenges involved in designing useful XAI systems for AD and the associated literature, however, this review is neither reproducible nor complete, especially for the perception and planning tasks.

In addition, Atakishiyev et al. [18] covered very similar topics to Omeiza et al. with a slightly broader coverage of recent XAI technologies for AV perception and planning. They propose an end-to-end (E2E) framework for integrating XAI with existing AD technologies, however, they did not elaborate how different XAI techniques can be integrated into the framework. Their literature review was also not described in sufficient detail to be repeatable.

Finally, the literature review of Zablocki et al. [19] identified potential stakeholders and why they might need explanations, the type of explanations useful for them, and when explanations need to be delivered. Based on that, they examine the different methods in the literature. However, they do not focus on the impact of XAI in meeting the requirements for safe and trustworthy AI. Furthermore, the survey has some shortcomings regarding completeness, since they only focus on vision-based methods for E2E systems. Accordingly, they do not consider XAI methods for planning and perception that can be applied to modular AD pipelines.

I-B Main Contributions

In light of the existing works and the increasing importance of XAI for AD, we make the following contributions:

1.

We discuss detailed requirements for AI in AD and highlight the importance of XAI in fulfilling them;
2.

Using a structured, systematic, and repeatable review methodology, we survey XAI methods applied for AD with a focus on environmental perception, planning and prediction, and control;
3.

Based on our review, we identify five paradigms of XAI techniques applied for safe and trustworthy AD which include interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation. Moreover, we discuss each paradigm using concrete examples;
4.

We analyse the limitations of existing modular XAI frameworks for AD and then propose a conceptual framework called SafeX that is designed to be readily used with the summarized XAI techniques.

I-C Scope and Structure

Our study gives a comprehensive view of the current state-of-the-art XAI approaches for AD encompassing both modular and E2E pipelines, focusing on perception, planning and prediction, and control. We also present a conceptual modular framework to incorporate XAI into the design of AVs. Our survey does not identify stakeholders nor aims to give background on mathematical foundations such as DNNs or reinforcement learning as existing surveys in Section I-A provide good coverage of these topics.

The structure of our survey is illustrated in Figure 1. In Section II, we provide foundations, where we define trustworthy AI and identify requirements corresponding to the application of AI in AD. Moreover, we describe the various sources of explanations for AI systems and introduce a taxonomy of XAI concepts as well as terminology for AD components. Section III describes our research questions and the methodology for the survey, assuring reproducibility. Section IV surveys the literature divided into interpretable design, interpretable monitoring, interpretable surrogate models, auxiliary explanations, and interpretable validation. In Section V, we review existing XAI frameworks in AD and propose our framework SafeX. In Section VI and Section VII, we discuss our findings and identify future directions in light of the results of our survey.

Figure 1: The structure of our survey. We present foundations into why we need trustworthy AI, an XAI taxonomy, and AD terminology. We then describe our survey methodology in detail so that our review is reproducible. In the analysis, we categorize existing XAI for AD approaches into five branches based on their different applications for AD: interpretable design, interpretable monitoring, interpretable surrogate models, auxiliary explanations and interpretable validation. A new conceptual framework is presented called SafeX based on our analysis. Finally, we discuss challenges and future directions.

II Foundations

We begin in this section with an exploration of the need for trustworthy AI. We then examine the requirements for applying trustworthy AI in AD. Our analysis highlights the critical role of XAI in fulfilling these requirements and identifies the sources of explanations in AI systems. We conclude by reviewing a taxonomy of XAI along with a detailed terminology of AD components.

II-A Trustworthy AI

Historically, AI was based on symbolic representations, where information was encoded using well-defined mathematical symbols, such as propositional logic or program induction. The first instances of successful AI applications were expert systems [20] which relied on such symbolic representations, lending themselves to varying degrees of inherent interpretability, that usually manifested in the form of causal chains of reasoning.

In contrast, current neural AI methods rely on sub-symbolic representations. Under this paradigm, input data is mathematically transformed into output via the learning of millions of parameters from large swathes of training data. This approach allows the modelling of highly complex multi-dimensional relationships which results in high performance. Still, the outputs of sub-symbolic systems are not interpretable due to their sheer size and high levels of abstraction. Therefore, they are often likened to black boxes that lack transparency.

While this efficiency versus transparency trade-off is sometimes acceptable (arguably, even non-existent [21]), highly complex safety-relevant systems such as AD cannot fully rely on black box systems, as they are not currently certifiable for safety. This is in addition to the countless ethical, social, and legal reasons why neural methods may also be suspect [22].

Symptomatic of these issues is the lack of trust by users of AI systems. To alleviate the many problems that stem from a lack of transparency, methods that automatically explain predictions and decisions to users have become popular [23], forming the field of XAI. However, achieving trustworthy AI is a much more complex issue than could be solved by merely imbuing AI systems with explainability. Instead, trustworthy AI must consider a complex set of socio-technical requirements, among others, human agency, technical robustness and safety, privacy and data governance, diversity, non-discrimination, fairness, and societal and environmental well-being [24]. Our focus on XAI is not to suggest that we can achieve trustworthy AI just via explainable methods but as a necessary element among the many approaches that support human-centric AI which strives to ensure that human values are central to how AI systems are developed, deployed, used, and monitored by ensuring respect for basic human rights.

In the following subsection, we explore in detail these requirements for trustworthy AI specifically for AD. Subsequently, we discuss from which sources and to what extent current AI methods are amenable to explanation and overview a taxonomy of XAI to organise our discussion.

II-B Requirements For Safe and Trustworthy AI in AD

Owing to the superior performance in high-dimensional tasks like image processing and object detection [25], black box sub-symbolic methods are now the predominant approach to solving challenges in AD. Unlike many other robotic domains, incorrect behaviour by AVs can cause serious injury or death to humans, meaning safety is a top priority for all stakeholders. Designing safe and trustworthy AI is, thus, becoming urgent for AVs, necessitating the definition of safety and trustworthiness requirements.

Unfortunately, no requirements are published specifically for AI in AD. Instead, we need to take more general requirements for safe and trustworthy AI as a starting point. We discuss whether these map to AI systems in AD and whether new requirements for AI in AD should be defined in Table I.

Fortunately, several guidelines have been developed for safe and trustworthy AI which are well-suited to address the transparent and safety-critical operation of AVs. One of the well-known AI regulations is the ethics guidelines released by the European Commission [26], in which seven key requirements for trustworthy AI were defined. These are (i) human agency and oversight; (ii) technical robustness and safety; (iii) privacy and data governance; (iv) transparency; (v) diversity, non-discrimination and fairness; (vi) societal and environmental well-being; (vii) accountability.

Another AI risk management framework was developed by the American National Institute of Standards and Technology (NIST) [27]. This also defined seven key characteristics so that trustworthy AI should be (i) valid and reliable; (ii) safe; (iii) secure and resilient; (iv) accountable and transparent; (v) explainable and interpretable; (vi) privacy-enhanced; (vii) fair with harmful bias management. According to this framework, validity and reliability are the bases for other characteristics, while accountability and transparency are overarching concepts related to all characteristics.

The requirements defined in these two proposals are derived from three main sources: data-, model-, and agency-related requirements. We synthesise them in Table I. First, diverse data and data governance are essential to avoid unbiased decisions and protect privacy. Second, an AI model itself ought to be, among others, robust, safe, and accountable. Third, the deployed AI models must be overseen by humans for which human agency is required. Similar requirements are proposed by other individual researchers. For instance, Alzubaidi et al. [28] defined similar requirements for trustworthy AI. They considered accuracy and reproducibility as separate requirements while the EU assigned those requirements to robustness. To avoid unnecessarily conflating conflicting definitions, we take the requirements derived from the two national-level proposals as our starting point, noting that other conceptions of trustworthy AI may be fit to these frameworks.

TABLE I: Summary of the defined requirements in the ethics guidelines (EU) and the AI risk management framework (USA) and discussion of their applicability to AD. The requirements are classified into three sources: data, model and agency.

Sources	Ethics Guidelines (EU)	AI Risk Management Framework (USA)	Transferable to AD?
Data	Privacy and data governance	Privacy-enhanced	Y
Data	Diversity, non-discrimination, fairness	Fair with harmful bias management	Y
Model	Technical robustness and safety	Safe	Y
	Transparency	Accountable and transparent	Y
	Accountability	Valid and reliable	Y
	Societal and environmental well-being	Secure and resilient	Y
	—	Explainable and interpretable	Y
Agency	Human agency and oversight	—	Y/N

Requirements From Data: Proper data governance is necessary for AD since privacy- and quality-sensitive data from drivers and external environments need to be processed. For instance, ML-based perception typically uses vision systems to perceive and understand the surroundings, where highly personal data such as pedestrian faces and license plates also appear. However, it is not necessarily the case that technical data must be classified as non-personal [29]. Under EU jurisdiction, for instance, the General Data Protection Regulation (GDPR) [30] may provide a legal basis for processing personal data when using AI-based functionalities, though it is unclear to what extent the unilateral and fully automated processing of personal data in AVs is covered by the GDPR. Moreover, to avoid, among others, unfair bias, we should rely on diverse data to train AI models. Particularly, the non-discrimination of pedestrians is an important requirement for ML-based systems in AD. Despite this, Li et al. [31] showed a bias for missing pedestrians who are children or have darker skin tones. In general, the elicitation of safety-related requirements should be seen as a process that includes multiple stakeholder perspectives to increase diversity [7].

Requirements From AI Models: AD is a safety-critical application where a lack of robustness could lead to traffic accidents. The environment in which an AV operates is complex, uncertain, and changes over time and space. DL-based models need to be robust not only to variations in the physical driving condition (e.g., differing weather conditions, and changes to the car behaviour due to component wear) but also to variations in the behaviours of other drivers, including the possibility that adversarial road users may try to exploit AV systems [32]. In addition, adversarial perturbations can fool deep learning-based neural networks [33], leading to implausible results. Therefore, AI needs to be robust against, among others, noise, distribution shift, and adversarial attacks [16] and must demonstrate safe decisions even in uncertain environments. Moreover, AVs need to be sufficiently transparent for the involved stakeholders such that the decisions of the AV can be understood. For instance, developers need transparency to debug models and thus improve system robustness, while regulators need transparency to audit and certify systems. Furthermore, deployed AI models should be user-centric and designed in a way that all people can benefit from their services regardless of their situation. Finally, establishing accountability for AD systems is important for determining liability in case of accidents [17].

Requirements For Agency: For level 3 AVs [34], human drivers are allowed to do non-driving-related activities while the AV undertakes dynamic driving tasks (DDTs) as long as it remains within the predefined operational design domain (ODD). Nevertheless, a driver should be prepared to take over control at any moment if the system fails or when the ODD is exceeded. In contrast, for more advanced level 4 and 5 systems, human drivers no longer need to stay in the loop which diminishes their oversight, especially when the AI systems are inscrutable. Without additional measures, human agency will suffer due to the use of black box systems. In particular, the users of AVs may have very little insight into the decision-making processes of AVs and could never hope to contest the decisions that may directly impact their bodily integrity. However, obtaining recourse in these situations may not just be a matter of enabling intervention on the AD systems, but rather the provision of explanations that calibrate users’ trust according to the system’s capabilities. Therefore, depending on the automation level of the system, the requirements for agency may or may not be addressed by existing systems.

In addition to the above three categories of requirements, safety assurance is imperative for AD [35, 36]. Safety is fundamentally important, underpinning and complementing the above three high-level requirements of trustworthy AI for AD. While requirements for trustworthy AI often include safety, safety assurance imposes strict constraints on the behaviour of AI systems as opposed to the more high-level criteria of other aspects of trustworthy AI. Therefore, safety could be viewed as a distinct set of requirements.

However, providing a comprehensive account of safety requirements would mandate its own publication, so instead we focus on one way of ensuring certain safety requirements that also align with many of the recommendations for trustworthy AI, namely explainable AI. First, XAI contributes to transparency by delivering (intelligible) explanations of AI models’ decisions. To show compliance with data protection regulations, one may call on XAI to provide evidence that AD systems do not process personal data and that they can function without personally identifying features in the data. Second, accountability through inquiry and traceability may be achieved, which is essential to show non-discrimination, determine failure cases, and establish a holistic case of their workings for legal proceedings or regulatory conformity. Third, XAI is beneficial for the inspection, debugging, and auditing of AI models, which can contribute to improved robustness and better-calibrated trust in AVs [37]. Therefore, we must conclude, that XAI is an essential tool in meeting the requirements of safe and trustworthy AI for AD.

II-C Sources of Explanation in AI

There are many ways in which an AI system may be amenable to revealing its decision-making process through explanation. Crucially, how information is represented and then used in the AI system directly influences the ways we can generate explanations for them. Understanding these different sources of explanation is essential to effectively discussing various methods of XAI because it helps not only to build a consistent vocabulary for XAI but also to understand the applicability of various XAI systems to AD and how they address the various requirements for safe and trustworthy AD.

We discuss here five sources of explanation from the XAI literature. These are: interpretability, explainability, justifiability, traceability, and transparency. These terms are related and not necessarily mutually exclusive properties of AI systems, however, they are often (incorrectly) used interchangeably. Our discussion here is informed by [24, 38, 39].

Interpretability: we call an AI system interpretable if it is sufficiently low in complexity such that a reasonably experienced user can understand the output of the system and the causal process that produced that output from the input [40]. Therefore, interpretability is an inherent quality of a system. Interpretable systems are often argued to be better suited for safety-relevant applications due to the observable chain of causality that led to a decision [21].

Explainability: we call an AI system explainable if the output of the system is accompanied by an additional output that takes the syntactic form of an explanation. The explanation should intelligibly communicate the reasoning process behind how the output was derived [41]. Explainability is not necessarily an inherent quality of the AI system, and may not accurately reflect the causal chain that produced the output.

Justifiability: an AI system’s decision is justifiable if one can explain why an output was good without necessarily explaining how the output was computed [39]. This property depends on a definition of goodness that will inherently depend on the application domain and the ethical framework the designers of the system see fit for use.

Traceability: an AI system is traceable if an external auditor can follow the causal chain of the full decision-making process from input to output. Any system that relies on a black box is not traceable since causality is obscured by design. A system might also only rely on white box systems but still be untraceable due to the sheer size of the models.

Transparency is a broad term that is often used (incorrectly) to mean any of the above definitions. As discussed in Section II-A, transparency is not solely the property of the AI system achievable via XAI but the result of a range of measures that enable the understanding and informed use of the system through a combination of, among others, documentation, XAI, standardisation, and risk assessments [24].

II-D Taxonomy of XAI

We now provide a taxonomy of XAI visualised in Figure 2 which is used later to describe the reviewed methods in Section IV. Our six taxonomic categories are based on Speith [42].

Refer to caption — Figure 2: A taxonomy of XAI covering the most important concepts occurring in the literature of XAI for AD with terminology borrowed from Speith [42].

Representation of information within the decision-making model has a significant effect on the working of the XAI system. We can differentiate between symbolic (e.g., rule-based system, decision tree, etc.) and sub-symbolic systems (e.g., deep learning), as well as mixed systems that utilise both. Note that this category is usually determined by the design of the decision-making system, not the XAI system. As discussed in Section II-A, it is more difficult to explain sub-symbolic systems and it may be more difficult to build trustworthy and safe systems that utilise such a representation.

Stage relates to when during the decision-making process an explanation is generated and from what representations. Post-hoc XAI systems are run after a decision has been made and are widely applicable to any AI method regardless of representation. These methods are usually assumed to have access only to the decision-making system and its input and output. In contrast, ante-hoc XAI systems are constrained to AI methods with symbolic or mixed representations as these XAI systems generate explanations directly from the information represented in the decision-making process which would not be possible with a black box system. These systems have more constrained applicability but are generally more trustworthy and verifiable.

Mode determines the syntactic and semantic form of the explanation. While there is a large range of explanatory modes, three are particularly popular in XAI. Surrogate systems condense the overall workings of a more complex method into an interpretable model, however, it is difficult to quantify to what extent these models can faithfully represent their parent models. In addition, representative and counterfactual examples are a mode to explain some aspect of the decision-making process in terms of an input example, but these rely on the assumption that the user can understand and interpret the example correctly. Importance-based explanations explain which features of the input representations are most influential for the model when it makes a prediction. These models provide a way to shed some light on black box decision making but importance must not be conflated with actual causality as they can often be altered without affecting the output prediction [43, 44]. Finally, for interpretable ante-hoc methods the system itself inherently serves as the mode of explanation, though understanding a model as the explanation itself requires significant cognitive processing and is unlikely to contribute to trustworthiness in most stakeholders.

Scope determines whether the explanation applies to a given input instance only (local), to a group of instances (cohort), or to the entire model as a whole (global). The scope of the explanation is tightly connected to its mode. Example- and importance-based explanations are more suited for local and cohort explanations while surrogate models represent the entire decision-making process and are, thus, global explainers.

Medium is the channel through which the explanations are intelligibly delivered to the stakeholders. How explanations are delivered has a profound influence on efficacy and intelligibility. It is a crucial design consideration and should complement a correct understanding of stakeholder requirements. Unfortunately, it is a prevailing trend in XAI to offer an explanation “as is” (e.g., feature importance plots, decision tree visualisations, etc.) without further regard to how it should be communicated.

II-E Terminology of AD Components

Although different divisions of AD components exist depending on the level of detail [46], the core competencies of an AV can be generally categorized into three components, which are perception, planning and prediction, and control [45, 47], as illustrated in Figure 3. Perception is the capability to gather data from the surroundings and derive meaningful insights or knowledge from that environmental information. Specifically, environmental perception refers to the development of a contextual understanding of the environment, which encompasses the identification of obstacles, detection of road signs and markings, and classification of data based on their semantic significance. Localisation refers to the ability of the AV to determine its position within the environment.

Planning and prediction involve the strategic process of making informed decisions based on predicted future trajectories of obstacles to achieve the vehicle’s higher-order goals. This typically includes navigating the vehicle from a starting point to a desired destination, while simultaneously avoiding obstacles and optimizing performance based on pre-designed constraints. According to [48], planning can be further divided into mission planning, behaviour planning and motion planning. Mission planning represents the selection of a route from its current position to the predefined destination based on the road network. Behavior planning is responsible for determining the appropriate driving behaviour at any point of time along the selected route, given the perceived behaviour of other traffic participants and road conditions, etc. Lastly, motion planning aims to find a collision-free, comfortable, and dynamically feasible path or trajectory once the behaviour layer signals a driving behaviour in the current driving context.

Finally, control competency denotes its proficiency in executing planned actions, which are formulated by its higher-level processing modules. In path tracking, the vehicle is required to converge to and follow a path generated by motion planning without including a temporal law [49]. In contrast, Trajectory tracking refers to the following feasible ”state-space” trajectories, which specify the time evolution of the potion, orientation, and linear and angular velocities [50].

The above modular system description enables the separate development of each component. In addition to modular approaches, there are E2E systems that replace the AD architecture with a single neural network [19], though often the control part is separated and the E2E network only comprises the planning and perception components. The motivation for E2E architectures relies on its simple design by avoiding the consideration of different interconnections between different modules and instead focusing on joint feature optimization of individual modules [51]. In contrast to modular pipelines, E2E networks are much less interpretable, so ensuring their safety is more challenging. It is easier to trace the source of errors in modular approaches [52].

III Review Methodology

Considering the requirements and challenges in implementing XAI for AD, the field has been growing in popularity. To comprehensively explore the published methods, we perform a systematic literature review following the recommendations of Kitchenham and Charters [53] and the review methodology section of Stepin et al. [54]. A structured review allows us to systematically explore the field by combining increasingly more fine-grained queries with online indexing databases, while our description of this process enables the repeatability of our search which can verify the validity of our work and help obtain an updated look of the field in the future.

To give an overview of the review process, first, we defined two primary research questions based on which we developed a query hierarchy. We used the resulting queries to search three indexing databases – Scopus, Web of Science, and IEEE Xplore – and applied a three-step process to arrive at a final set of 84 publications. We describe the full process below.

III-A Research Questions

RQ1

What are the current methods of XAI that address requirements of safety and/or trustworthiness, and what are their key contributions to meeting these requirements?
RQ2

What concrete general frameworks are proposed for integrating XAI with autonomous driving?

III-B Search Process

We chose the Scopus, Web of Science (WoS), and IEEE Xplore online indexing databases to perform our review, as these platforms provide extensive coverage of both technical and non-technical venues as well as the ability to construct and refine detailed queries. To obtain a list of candidate papers, we constructed a search hierarchy as shown in Figure 4. Each level of depth in this tree corresponds to increasingly more refined search terms such that the final list of candidate papers was a set of highly relevant publications with manageable counts. The queries are shown below in the WoS notation, and equivalent queries were constructed for both Scopus and IEEE Xplore. The queries were applied to the title, author keywords, and abstract field of each indexed publication, and the search was carried out between 22 to 26 September 2023.

•

$q_{1}$ : expla* OR interp* or XAI
•

$q_{2}$ : $q_{1}$ AND (auto* AND (driv* OR vehicle* OR car*) OR self driving);
•

$q_{3}$ : $q_{2}$ AND safe*;
•

$q_{4}$ : $q_{2}$ AND trust*;
•

$q_{5}$ : ( $q_{3}$ OR $q_{4}$ ) AND (pipeline OR architecture OR framework);
•
$q_{6}$ : ( $q_{3}$ OR $q_{4}$ ) AND …
- –
  
  $q_{6a}$ : sense OR perception OR computer vision OR object detection OR semantic segmentation;
- –
  
  $q_{6b}$ : prediction OR plan*;
- –
  
  $q_{6c}$ : control*.

Our choice for $q_{1}$ selects all papers that are related to explaining, interpretation, or any papers that mention XAI. At this point, we did not constrain our search with keywords relating to a particular subject area (e.g., autonomous driving) to build a large foundation of papers to select from. We narrowed our search to focus on autonomous driving (and related keywords) using $q_{2}$ , and then further filtered papers based on whether they contain keywords relating to trust or safety. To answer RQ1, we take this set of papers and sort them based on whether they relate to a particular subsystem of the AD stack as shown in Figure 3. To answer RQ2, we filter the collected set of papers based on keywords that relate to frameworks or architectures.

TABLE II: The number of papers collected for queries corresponding to RQ1 (

q_{6}

) and RQ2 (

q_{5}

), while not showing

q_{1-4}

as these queries had multiple thousands of papers.

	WoS	IEEE	Scopus	Duplicates	Total (w/o dups.)
RQ1	130	135	169	178	256
RQ2	7	11	7	8	17

The search and selection process was conducted as indicated in Figure 5 and explained below. The total numbers of papers retrieved for the research questions are shown in Table II. After querying for papers we have removed all duplicate papers. The remaining set was then filtered based on our exclusion and inclusion criteria (detailed in Section III-C). We then proceeded to filter the remaining papers based on their full text and re-applied the same exclusion and inclusion criteria to determine which papers to include in our final list.

III-C Inclusion and Exclusion Criteria

We now describe the inclusion and exclusion criteria that were used for both research questions at each stage of the search process to arrive at the final list. At each stage in the filtering process, we first applied a list of inclusion criteria to determine which papers to keep at that stage. All of these inclusion criteria must have been fulfilled by the paper to pass this stage. We included papers where:

•

The paper was – in part or fully – motivated by a need for safer or more trustworthy technologies; AND
•

The paper proposed a concrete system, algorithm, framework, or novel artefact related to artificial intelligence;

After the inclusion process, we applied a list of exclusion criteria which specified more detailed requirements on the papers. We filtered out papers if they met at least one of the exclusion criteria. We excluded papers where:

•

It showed no attempt to address any of the sources of explanations (as described in Section II-C); OR
•

The main domain of application or evaluation was not autonomous driving; OR
•

The paper did not address perception, planning, prediction, or control for autonomous driving;

IV XAI For Safe and Trustworthy AD

We found five main categories into which we sorted the reviewed papers. These are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation.¹¹1In contrast to our definitions of the sources of explanation in Section II-C, Du et al. [55] grouped interpretable machine learning into intrinsic and post-hoc interpretability. Our collected publications include both types of methods. An overview of the advantages and disadvantages of methods for each category is given in Table III and visual illustrations of these categories are shown in Figure 6. In this section, we present an overview of different methods in each category and analyse their relevance to achieving safer and more trustworthy AD. For each category – except interpretable validation which has only five works – we present a table of summary (Tables IV, V, VI and VII) of methods according to the XAI taxonomy presented in Section II-D.

TABLE III: Definition, advantages (+), and disadvantages (–) of each of the five categories of reviewed methods.

Method	Summary
Interpretable by design (Section IV-A)	Inherently interpretable method whose design reveals the explicit causal relationship between input(s) and output(s). Often these methods rely on decision trees, Bayesian networks, interpretable latent space, or rule-based algorithms.
Interpretable by design (Section IV-A)	+ May be used to verify formal claims of safety about the algorithm;
	+ Uses meaningful state abstractions which offer intelligible explanations.
	– Requires extensive domain knowledge to engineer meaningful representations or high-level driving maneuvers;
	– Analysing interpretable systems requires significant AI expertise which is not available to all stakeholders.
Interpretable surrogate model (Section IV-B)	An interpretable-by-design algorithm approximates the behaviour of a black box model such that its primary goal is to provide intelligible explanations of the black box model to some stakeholder(s).
Interpretable surrogate model (Section IV-B)	+ Perception errors can be analyzed thoroughly contributing to safety analyses;
	+ Can apply many readily applicable tools from XAI (e.g., SHAP, LIME, GradCAM);
	+ May combine with natural language generation to create intelligible explanations.
	– Feature attribution methods can be inconsistent, hard to interpret, and incorrect;
	– Cannot give formal guarantees on safety.
Interpretable monitoring (Section IV-C)	An interpretable-by-design model is used to verify the decision-making algorithm’s output such that it ensures safer AI deployment for AVs.
Interpretable monitoring (Section IV-C)	+ Greatly improves (perceived) safety of the AV;
	+ Can be deployed with existing decision-making/perception systems.
	– May introduce prohibitive computational overhead;
	– Not suited for standalone deployment.
	– May fail to generalise or correctly identify unsafe actions if the interpretable model is too simple, incorrect, or biased.
Auxiliary explanations (Section IV-D)	The execution of the decision-making algorithm creates auxiliary information that provides information about how the algorithm works. Methods using attention mechanisms are very common here.
Auxiliary explanations (Section IV-D)	+ Applicable to most decision-making algorithms;
	+ Low overhead on generation as the explanation is a by-product of the decision-making process.
	– Usually does not reveal enough information about the decision-making process to improve safety or trustworthiness;
	– Requires careful manual analysis to interpret;
	– Attention-based auxiliary explanations may look plausible despite being incorrect.
	– Heat maps can be fragile and unreliable
Interpretable safety validation (Section IV-E)	Provides an interpretable way to generate adversarial behaviours of surrounding agents for the validation of AVs.
Interpretable safety validation (Section IV-E)	+ Greatly improves safety of AV;
	+ Extracts unique failuer and challenge scenarios that can be used for further AD assessment.
	– Very challenging to generate unique scenarios;
	– Applicable only during pre-deployment due to excessive runtime requirements.

IV-A Interpretable By Design

Definition IV.1 (Interpretable By Design).

We call an algorithm interpretable by design if it is inherently interpretable such that its design reveals the explicit causal relationship between its input(s) and output(s) [40].

TABLE IV: Summary of interpretable-by-design methods using the XAI taxonomy of Section II-D. All methods in this category are ante-hoc (Stage). Evaluation methods are based on fixed simulated scenarios, randomised simulations, or datasets (names listed). Missing entry (—) means not applicable. No E2E methods were found for this category. * Conceptual framework

	Paper	Method					Evaluation
		Task	Representation	Mode	Scope	Medium	Method	User study
Perception	[56]	traffic sign detection	symbolic	inherent	global	—	self-curated dataset	no
	[57]	pedestrian detection	subsymbolic	inherent	local	—	CityPersons [58]	no
	[59]	semantic segmentation	subsymbolic	inherent	global	—	CityScapes [58]	no
	[60]	semantic segmentation	subsymbolic	inherent	global	—	SYNTHIA [61]	no
	[62]	context understanding	mixed	inherent	local	—	nuScenes [63]	no
	[64]	eye fixation	subsymbolic	inherent	global	—	DR(eye)VE [65]	no
	[66]	sun-glare recognition	mixed	inherent	local	—	self-curated dataset	no
Planning & Prediction	[67, 68]	motion planning, trajectory prediction	symbolic	inherent	global	—	scenarios	no
	[69]	trajectory prediction	mixed	inherent	global	—	NGSIM [70]	no
	[71, 72]	goal prediction	symbolic	inherent	global, local	—	inD [73], rounD [74], openDD [75]	no
	[76]	trajectory prediction	mixed	importance	local	visual	INTERACTION [77]	no
	[78]	motion planning	symbolic	surrogate	global	textual	scenarios	no
	[79]	motion planning	symbolic	inherent	global	textual	self-curated dataset	yes
	[80]*	motion planning	mixed	importance	local	textual	—	yes
	[81]	lane-change prediction	mixed	importance	local	—	HighD [82]	no
	[83]	motion planning	symbolic	importance	global	textual	CARLA [84]	yes
	[85]	pedestrian prediction	symbolic	importance	global	—	scenarios	no
	[86]	lane-change prediction	mixed	inherent	local	—	HighD [82]	no
	[87]	lane-change prediction	mixed	surrogate	global	—	simulation	no
Control	[88]	safe control	mixed	inherent	local	—	simulation	no

IV-A1 Interpretable By Design – Perception

Chaghazardi et al. [56] introduced an inductive logic programming approach for traffic sign classification where firstly high-level features such as colour, shape, etc. are extracted and then a hypothesis is learned. The design increases transparency and reliability. Moreover, a higher robustness against adversarial attacks compared to other state-of-the-art algorithms was shown. In [57], Feifel et al. proposed a structured interpretable latent space in a DNN for pedestrian detection which learns to extract specific prototypes. The learned prototypes in the latent space can be clustered in a projected 2D-plane via a principal component analysis [89] or a t-SNE projection [90]. Due to the interpretably designed DNN, an ante-hoc analysis is possible which supports the safety argumentation. Plebe et al. [60] developed a temporal autoencoder for lane and car detections in semantic segmentation consisting of an organized latent space where semantic concepts of lane and car segments are learned. Similarly, Losch et al. [59] proposed semantic bottleneck models for semantic segmentation tasks which aligned every channel with a human interpretable visual concept. The introduction of semantic concepts in the latent space additionally increases transparency in the prediction by the DNN. Oltramari et al. [62] developed a hybrid AI framework for perceptual scene understanding via instructing the latent space of DNNs with knowledge graphs that are extracted from clustering the labelled training data. Martinez et al. [64] developed an interpretable latent space in the DNN by using capsule networks [91] to predict eye fixations in AD scenarios and contextual conditions. With these capsules, it is possible to express interpretable relationships between features and contextual conditions on frame-level and pixel-level. In [66], Yoneda et al. trained a CNN to identify the presence of sun-glare in the AD environment. Subsequently, heat maps with a Gradient-weighted activation map approach (Grad-CAM) [92] were calculated to identify the regions of sun glare in the image. The developed heat map approach increases transparency in the decision-making process.

IV-A2 Interpretable By Design – Planning & Prediction

Methods in this category create explanations for mainly three purposes: goal/trajectory prediction, lane-change intention prediction, and motion planning. One of the most common design choices is the use of high-level, interpretable driving maneuvers [67, 68, 83, 85, 71, 72]. These maneuvers provide a convenient abstraction over lower-level state variables (e.g., acceleration and steering) and render the decision-making process tractable and interpretable.

However, interpretable-by-design approaches rely on extensive domain knowledge which may not be scalable to more complex decision-making. Moreover, some methods rely on BNs to define probabilistic models of the decision-making process [83, 85, 78], which needs a good understanding of and assumptions about the causal processes behind driving decisions. The benefit of BNs is that they provide a principled mathematical framework to reason about causality, however, the relience on expert knowledge has the potential to introduce human modelling errors and biases.

Another approach for motion planning is to rely on Monte Carlo Tree Search (MCTS) over high-level driving maneuvers to create a shallow search tree via simulations that is interpretable [67, 68, 78]. MCTS has the benefit of covering only relevant parts of the search space while avoiding unsafe actions, however, it relies on trajectory predictions for other traffic participants and is computationally expensive.

Interpretable goal and trajectory prediction have also received attention. These methods rely either on rational (Bayesian) inverse planning [69, 67], decision trees [71, 72], or discrete choice models [76]. There is significant work on explaining lane change (LC) predictions using a variety of methods which include time-series motifs [81], rule extraction [87], and interpretable variational auto-encoders [86]. One work stands out for predicting pedestrian intentions [85] using a dynamic BN derived from annotated real-world image data.

Only three papers ran a user study [80, 83, 79] and only one of those provided a more thorough investigation of the effects of explanations on users [79]. This latter work is also one of the only methods in this category which elicited stakeholders’ requirements in detail along the axes of intelligibility, accountability, and trust. However, a major limitation of their proposed method is the use of a highly specialised dataset with annotations for high-level semantic and structured explanations which may not be readily available.

IV-A3 Interpretable By Design – Control

Zheng et al. [88] proposed an ante-hoc explainable controller in which the output of a neural network-based controller with control barrier function filters is projected onto a safe set in an interpretable manner via quadratic programs and a gauge map. Their method provides good traceability of the control process and discusses in-depth the effects of interpretable control on safety validation.

IV-A4 Interpretable By Design – Summary

In the field of perception, the majority of algorithms that are interpretable by design rely on the construction of an interpretable latent space within the DNN. By compelling the algorithm to learn semantic concepts, the automatic feature extraction becomes more interpretable. Another possibility is to modularize the perception algorithm into multiple algorithms that learn to identify different semantic concepts. By training the perception algorithm to extract semantic concepts, the algorithm is forced to learn an interpretable feature extraction. However, it is challenging to define the concepts. Additionally, a dataset which contains semantic concepts of an object as a label is often required. Lastly, it is also possible to use the interpretability of a classifier to localize objects [66].

In planning and prediction, it is common to hand-craft high-level interpretable features or maneuvers to abstract the low-level state space. These abstractions may then be used in various algorithms for decision-making, such as Monte Carlo Tree Search (MCTS), Bayesian networks (BN), or decision trees. However, creating these abstractions requires significant domain knowledge and careful engineering. For the control task, constraint-satisfaction algorithms can be applied to map the output of a DNN onto a safe-by-construction control domain in an interpretable way.

These methods are intrinsic explainable and have the potential to improve both the trustworthiness and the safety of AD. Interpretable algorithms provide a clear causal link between the input and the output of the algorithm which may enable safety validation, while meaningful high-level abstractions may be easily understood by people which can contribute to accurate trust calibration. Unfortunately, very few methods investigate the algorithms’ efficacy with actual stakeholders leaving many of the motivating claims of these works unaddressed. This is especially clear from the lack of methods that generate concrete, intelligible explanations (i.e., explainable systems as defined in Section II-C). Instead, papers usually offer scientific analyses of the interpretable components of their systems, but these are impossible to scale for varied stakeholders and do not consider the requirements for achieving more trustworthy AD (cf. Section II-B). The missing motivation for trustworthiness is also clear from the lack of user studies that evaluate the benefit of explanations on stakeholders.

IV-B Interpretable Surrogate Models

Definition IV.2 (Interpretable Surrogate Models).

We call a system an interpretable surrogate model if an interpretable-by-design model approximates the behaviour of a black box algorithm such that this provides intelligible explanations of the black box algorithm [41].

TABLE V: Summary of interpretable surrogate methods using in part the taxonomy of Section II-D. All methods in this category are post-hoc (Stage) and subsymbolic (Representation). The Surrogate field refers to the specific surrogate method used to approximate the underlying black box. Missing entry (—) means not applicable.

	Paper	Method					Evaluation
		Task	Surrogate	Mode	Scope	Medium	Method	User study
Perception	[93]	object detection	SHAP, RF	importance	local	—	nuScenes [63]	no
Planning & Prediction	[94]	vehicle following	SHAP, RF	importance	global, local	—	simulation	no
	[95]	lane-change prediction	max-entropy SHAP	importance	local	—	HighD [82]	no
	[96]	lane-change prediction	mean impact value	importance	local	—	HighD [82]	no
	[17]	motion planning	decision tree	surrogate	global	textual	scenarios	yes
	[97]	route planning	decision tree	surrogate	local	visual	simulation	yes
	[98]	motion planning	cognitive model	surrogate	global	textual	HEADD [99]	yes
Control	[100]	position control	clustering	surrogate	cohort	—	simulation	no
E2E	[101]	action selection	generative model	importance	local	—	BDD100K [102], BDD-OIA [103]	yes
	[104]	action selection	DNN	importance	local	—	simulation	no

IV-B1 Interpretable Surrogate Models – Perception

Ponn et al. [93] introduced a model-agnostic surrogate model for camera-based object detectors. A random forest is trained to predict a detection score according to meta-information about the environment in the training data. Afterwards, Shapley values are calculated to measure the impact of different features from the meta-information which helps interpret the results, such that the behaviour of the object detector under influencing factors in the environment can be estimated.

IV-B2 Interpretable Surrogate Models – Planning & Prediction

Cui et al. [94] combined SHAP and random forests to increase the transparency of decision-making driven by a deep reinforcement learning (DRL) algorithm. In their framework, SHAP determines the important features associated with the decision made by the DRL algorithm and an RF model is trained using these features to explain the decisions of the original DRL model. In addition, Li et al. [95] also relied on SHAP to understand the importance of features for LC predictions. They propose a modified version of SHAP called Maximum Entropy SHAP (ME-SHAP) that they use to explain an XGBoost-based LC decision model. Their evaluation shows that the ME-SHAP feature contributions may be rationalised in terms of intuitive driving actions, but the qualitative benefits of ME-SHAP for human understanding are not substantiated.

Three works stand out which avoid using SHAP while also carrying out significant user studies. First, Mishra et al. [97] used a decision tree to explain an RL agent’s actions based on states and corresponding actions determined by the optimal policy. They created a visual explanation interface and evaluated it with both students and experts using a wide range of qualitative and quantitative analyses. They showed that their method is effective, applicable with experts, and more effective than textual explanations.

Second, Omeiza et al. [17] created natural language explanations by deriving decision trees from scene graphs. Their algorithm is based on pre-defined meaningful features but their method description is limited which makes it difficult assess how well it would work in unseen scenarios. An extensive user study was used to measure the effects of explanations on the perceived accountability of the AD system and on users’ understanding of how the AD system works.

Finally, Gyevnar et al. [98] proposed a method called CEMA which is based on a cognitive model of how people select causes for explanations. They also generated natural language explanations, but unlike the previous methods which used a decision tree, their method relied on simulations based on a probabilistic planner. They generated counterfactual worlds that were used to analyse the causal relationships affecting the motion planning of the AV. They evaluated their explanations with more than 200 online participants against a baseline of human-written explanations called HEADD [99].

IV-B3 Interpretable Surrogate Models – Control

Surrogate models of control should be strongly focused on safety given their safety-critical hardware-level application. We identified here one work [100] in which the authors analysed the behaviour of a DNN-based controller that stabilizes the dynamic position of an AV under disturbing environmental conditions. A cross-comparable clustering method for the time series data was introduced to interpret a response signal from a neural network, such that the internal model understanding and transparency were increased. However, the specifications of the underlying neural network is not explained, which makes their claims hard to reproduce and generalise.

IV-B4 Interpretable Surrogate Models – End-to-end

Zemni et al. [101] proposed an object-centric framework which generates counterfactual explanations for E2E decision models. The E2E decision model was designed to have an instance-based latent representation. Thereby, the generative model was able to produce new images with slightly changed objects from the original input image. By analyzing changes in the output, the framework helps to understand the influence of objects in the environment on the decisions of the network. Shi et al. [104] proposed a self-supervised interpretable framework to produce an attention mask corresponding to the importance assigned to each pixel, which constitutes the most evidence for an agent’s decisions. The core concept of the framework is a separate explanation model trained for vision-based RL.

IV-B5 Interpretable Surrogate Models – Summary

An interpretable surrogate model consists of a meta-model with interpretability capabilities that approximates the behaviour of a black box model and thus supports understanding the internal working principle of a black box model. For perception, this may be done by training a different ML model, such as a random forest that can be interpreted by inspection or via, for example, Shapley values [93]. This type of interpretable surrogate model has the potential to increase the transparency and reliability of the network, as detection errors can be analyzed more thoroughly, thereby enhancing the understanding of the model’s behaviour. It is similarly common to use Shapley values in planning and prediction, especially as implemented by SHAP [105], and only three works did not rely on SHAP.

Unfortunately, blindly relying on SHAP-based interpretability may not help achieve safer AD as this method is susceptible to a variety of issues that lead to inconsistent explanations when compared with other feature saliency methods [106, 107]. This also means that care must be taken in their use when trying to improve the trustworthiness of AVs, and user studies with relevant stakeholders are essential to validate claims about trustworthiness.

For the control task, the output of a DNN can be interpreted in a post-hoc manner by analyzing the data over time with a clustering approach [100]. In E2E learning, a separate generative model can be trained via the black box model [101]. The generative model can then be used to provide counterfactual explanations of the model. Moreover, it is possible to train a separate explanation module of the E2E network [104].

IV-C Interpretable Monitoring

TABLE VI: Summary of interpretable monitoring methods using the taxonomy of Section II-D. No user studies were done in this category. Missing entry (—) means not applicable. No Control and E2E methods were identified in this category.

	Paper	Method						Evaluation
		Task	Representation	Stage	Mode	Scope	Medium	Method
Perception	[108]	traffic sign recognition	subsymbolic	post-hoc	inherent	global	—	GTSRB [109]
Perception	[110]	traffic sign recognition	subsymbolic	post-hoc	importance	local	visual	simulation
	[111]	object detection	subsymbolic	post-hoc	surrogate	global	—	COCO [112], Broden [113], KITTI [114]
	[115]	anomaly detection	subsymbolic	post-hoc	importance	local	—	self-curated dataset
Planning & Prediction	[116]	accident prediction	mixed	post-hoc	importance	local	visual	DADA-2000 [117]
	[118]	action selection	mixed	post-hoc	importance	local	—	simulation
	[119, 120]	action selection	symbolic	ante-hoc	inherent	global, local	—	self-curated dataset
	[121]	anomaly detection	symbolic	ante-hoc	inherent	local	textual	CARLA [84]
	[122]	collision prediction	symbolic	ante-hoc	inherent	global, local	—	simulation
	[123]	accident prediction	subsymbolic	post-hoc	importance	local	visual	CCD [124]
	[125]	risk scoring	symbolic	ante-hoc	importance	local	—	Lyft [126]
	[127]	motion planning	mixed	post-hoc	surrogate	global, local	—	simulation

Definition IV.3 (Interpretable Monitoring).

We call a system an interpretable monitoring system if an interpretable-by-design model is used to verify a decision-making algorithm’s output such that this ensures safer deployment of AVs.

IV-C1 Interpretable Monitoring – Perception

In [108], Kronenberger et al. examined interpretable DNNs for traffic sign recognition. They introduced additional explanations of visual concepts such as colours, shapes and numbers or symbols. These visual concepts are used to verify the decision of the network. Hacker et al. [110] also proposed a monitor for traffic sign recognition. The monitor consists of various mechanisms including an interpretable saliency detector. During operation, the saliency map is computed via occlusion sensitivity [128] and is compared by computing the Euclidean distance to an offline computed saliency map for each traffic sign category. Keser et al. [111] proposed an interpretable and model-agnostic monitor by introducing a concept bottleneck model (CBM) which is used for a plausibility check with the original DNN-based object detector. The interpretability of CBM is achieved by learning human-interpretable labels. Fang et al. [115] constructed a fault diagnosis framework to monitor a system’s operational status, while the interpretability of the fault diagnosis is achieved by calculating the contribution of each input feature to the anomaly detection results. The perceptual monitors enhance the reliability of the decision process for the detection algorithm in an interpretable manner. Moreover, robustness is increased. Besides detecting anomalous behaviour of the network, the monitor is also able to detect unsafe inputs.

IV-C2 Interpretable Monitoring – Planning & Prediction

Interpretable monitoring of AD planning and prediction systems are primarily concerned with two tasks: accident/collision prediction and safe action selection. A majority of methods here rely on symbolic representations, predominantly decision trees [119, 120, 122, 125, 118] to predict either a binary or scalar safety score for a fixed set of high-level actions. These methods are useful to assess the safety of potential actions before they are executed, however, they rely only on the state description without considering other visual cues.

This shortcoming is addressed in other works that rely on raw perception data to assess the safety of driving maneuvers, for example by Karim et al. [123] who used GradCAM [92] to extract visual explanations for accident prediction. More uniquely, Bao et al. [116] proposed a two-stage design for traffic accident prediction based on visual attention informed by a Markov decision process designed based on human-like visual attention fixation. Stage 1 uses saliency maps to show visual attention for both top-down (focus on a particular region) and bottom-up (consider everything) vision, while stage 2 is a stochastic Markov decision process in which an agent predicts the probability of an accident as well as the visual fixation area, such that this setup balances exploration through visual fixation with exploitation for more accurate accident prediction.

In contrast to predicting the safety of a single action, Schmidt et al. [127] proposed a decision tree-based monitoring pipeline for full motion planning. They used imitation learning to train a decision tree based on an RL teacher policy that was trained for safe driving under a constrained MDP. The method was shown to be verifiable and easily interpretable, although their evaluation was limited only to lane-change decisions.

Rather than monitoring the safety of motion planning on its own, Gilpin et al. [121] designed a high-level explanatory framework for holistic anomaly detection within the AV that can also create explanations for end-users using natural language. They proposed a hierarchy of systems to first select explanations generated by lower-level systems (e.g., control and perception modules), and then synthesise higher-level explanations using first-order logic rules and common sense knowledge. Unfortunately, they did not run a user study, so their explanations’ practical usefulness was not assessed.

IV-C3 Interpretable Monitoring – Summary

For the perception task, surrogate models that are interpretable by design can be used to verify the decision of a perception algorithm through their interpretable extraction of semantic concepts. Moreover, the internal workings of a perception algorithm can be monitored via a heat map monitor or an interpretable meta-model can be developed as a monitor to identify anomalies and their causes in the perception algorithm.

For prediction and planning, most methods rely on decision trees to predict a binary safety label or scalar risk score for the ego vehicle’s actions. These methods only utilise the state description of the environment. However, two methods were proposed that consider visual cues from image data as well.

Monitoring methods are well suited to post-hoc address the safety concerns of AD, but they are not designed to calibrate trust. A huge challenge in interpretable monitoring systems is the trade-off between the computational complexity and the performance of the monitoring algorithm, since the monitor should not take too much time to operate and should not avoid consuming the main system’s resources [129].

IV-D Auxiliary Explanations

TABLE VII: Summary of auxiliary explanation methods using the taxonomy of Section II-D. No user studies were performed in this category. No Control methods were identified in this category.

	Paper	Method						Evaluation
		Task	Representation	Stage	Mode	Scope	Medium	Method
Perception	[130]	semantic segmentation	subsymbolic	post-hoc	importance	local	visual	IDD-lite [131]
	[132]	semantic segmentation	subsymbolic	post-hoc	importance	local	visual	CamVid [133]
	[134]	semantic segmentation	subsymbolic	post-hoc	importance	local	visual	SYNTHIA [61], A2D2 [135]
	[136]	semantic segmentation	subsymbolic	post-hoc	importance	local	visual	KITTI road [114]
	[137]	2D object detection	sybsymbolic	post-hoc	importance	local	visual	self-curated dataset
	[138]	3D object detection	sybsymbolic	post-hoc	importance	local	visual	KITTI [139]
	[140]	traffic light detection	sybsymbolic	post-hoc	inherent	global	visual	BSTLD [141]
	[142]	image classification, semantic segmentation	subymbolic	post-hoc	importance	local	visual	nuScenes [63]
	[143]	3D object detection	subsymbolic	post-hoc	importance	cohort	visual	KITTI [139]
	[144]	2D object detection, semantic segmentation	subsymbolic	post-hoc	importance	cohort	visual	self-curated dataset
Planning & Prediction	[145]	trajectory prediction	subsymbolic	ante-hoc	importance	local	—	NGSIM [70], HighD [82], self-curated dataset
	[146]	scene graph learning	subsymbolic	ante-hoc	importance	local	—	ROAD [147], Oxford RobotCar [148]
	[149]	goal recognition, motion planning	subsymbolic	post-hoc	importance	local	—	Lyft [126]
	[150]	lane-change prediction	mixed	post-hoc	importance	local	—	NGSIM [70]
	[151]	trajectory prediction	subsymbolic	ante-hoc	importance	local	—	NGSIM [70], HighD [82]
	[152]	risk prediction	subsymbolic	ante-hoc	importance	local	—	HDD [153], CARLA
	[154]	trajectory prediction	subsymbolic	post-hoc	importance	local	visual	Lyft [126]
E2E	[155, 156, 157, 158, 159, 160], [103]	action selection	subsymbolic	ante-hoc	importance	local	textual	BDD-X [155], PSI [161], BDD-OIA [103], SAX [162]
	[163]	action selection	subsymbolic	ante-hoc	importance	local	visual	GTAV simulator
	[164]	motion planning	subsymbolic	ante-hoc	inherent	local	visual	NuScenes [63], CARLA [84], self-curated dataset
	[165]	control	subsymbolic	ante-hoc	inherent	local	visual	CARLA [84]
	[166], [167]	steering	subsymbolic	post-hoc	importance	local	visual	TORCS [168], CARLA [84]
	[169]	braking	subsymbolic	post-hoc	importance	local	visual	BDD-A [170], CAT2000 [171]
	[172]	trajectory planning	subsymbolic	post-hoc	importance	local	visual	CARLA [84]
	[173]	motion planning	subsymbolic	post-hoc	inherent	local	visual	self-curated dataset
	[174]	motion planning	subsymbolic	ante-hoc	importance	local	visual	nuScenes [63], self-curated
	[175]	steering	subsymbolic	ante-hoc	importance	local	visual	Udacity [176]
	[177]	action selection	sybsymbolic	ante-hoc	inherent	local	visual	CARLA [84]

Definition IV.4 (Auxiliary Explanations).

We say that an algorithm can provide auxiliary explanations if the execution of the algorithm creates auxiliary information that provides information about how the algorithm produced its output.

IV-D1 Auxiliary Explanations – Perception

In perception tasks, heat maps are often created to explain the prediction results by highlighting regions that influence the network’s decision. A widely used model-specific approach is Grad-CAM [92] which visualizes the activation, typically in the last layer. Kolekar et al. [130] applied Grad-CAM to a DNN for camera-based semantic segmentation. Saravanarajan et al. [132] also inspected the behaviour of a DNN for semantic segmentation via Grad-CAM under the synthetically generated haze. In addition to the last layer, Grad-CAM was also applied to two layers in the encoder and one in the decoder resulting in four different heat maps, thus increasing transparency in the decision understanding of the DNN.

Abukmeil et al. [134] proposed a variational autoencoder for a semantic segmentation task and generated multiple heat maps by computing the second-order derivatives between the encoder layers and the latent space. The resulting attention maps are aggregated and fused with the last decoder layer to improve the results. Mankodiya et al. [136] defined a framework to determine the important area of an image contributing to the outcomes of semantic road segmentation, while the XAI methods used here were Grad-CAM and saliency maps.

In [137], Nowak et al. computed attention heat maps for a DNN-based bus charger detection. Additionally, these heat maps are used to identify spurious predictions and are further used for training via data augmentation to increase robustness. Besides providing transparency due to the heat maps, the robustness of the DNN is also increased. The aforementioned approaches only focused on camera-based perception tasks. Schinagl et al. [138] proposed a model-agnostic attribution map generation method for LiDAR-based 3D object detection. The heat maps are generated perturbation-based via systematically removing LiDAR points and observing the output changes. They also propose various visual analysis tools which help identify potential misbehaviour of a DNN-based perception system in an interpretable manner. This way, more transparency in the model working is given and the whole development process of the ML system becomes safer.

Gou et al. [140] developed the framework Vatld to examine traffic light detection algorithms by analyzing input-output data as well as intermediate representations. Disentangled representation learning was used to extract semantic concepts in the latent representation such as color, background, rotation etc.. Therefore, the analysis tool heavily relies on DNNs that are based on representation learning.

In [142], Schorr et al. developed a toolbox with various state-of-the-art visualisation algorithms of a CNN for image classification and semantic segmentation including Grad-CAM and its extensions, saliency maps [178] and guided back-propagation [179]. Wang et al. [143] proposed a framework to interpret 3D-object detection failures by combining macro-level spatiotemporal information and micro-level CNN features. For the micro-level feature extraction, the heat map algorithm Grad-CAM and the aforementioned Vatld framework were used. Haedecke et al. [144] introduced the analysis toolbox ScrutinAI for semantic segmentation and object detection tasks by offering several visualisation tools. Particularly, ScrutinAI may distinguish between metadata in the input (e.g., different observable body parts in an image for pedestrian detection) to explicitly identify model weaknesses related to semantic concepts of objects.

IV-D2 Auxiliary Explanations – Planning & Prediction

The overwhelming majority of methods generating auxiliary explanations rely on the attention mechanism to gain some insight into how the algorithms reached their output. The recurring design pattern here is that a recurrent neural network (RNN) is proposed onto which an auxiliary attention mechanism is bolted. Alternatively, instead of an RNN, a transformer architecture is proposed, in which case the attention mechanism is built into the neural network from the start. The explainability analyses of these methods are then performed by looking at the attention scores that the model assigns to either the input or some interpretable input embedding. Finally, the attention scores are sometimes visualised using heat maps or bar graphs.

For example, Jiang et al. [145] proposed a transformer-based method for inter-vehicle trajectory interaction analysis. Their evaluation showed that the proposed model is significantly faster than similar methods and performs competitively as compared to baselines, with the added benefit of some interpretability analysis. In addition, Kochakarn et al. [146] designed an algorithm with spatial and temporal attention for road scene understanding. A self-supervised scene-graph learning algorithm is used to create spatiotemporal embeddings of scene graphs based on graph contrastive learning, which is then used for driver action prediction as a downstream task. As the final stage of graph embedding, an attention layer is used to highlight the most important spatial and temporal factors in the scene graph sequence as a form of post-hoc explainability. Yu et al. [152] also used an attention mechanism with scene graph embeddings as well as image data to predict binary risk prediction. However, no quantitative evaluation is given and only one qualitatively interesting example is presented of the impact of attention mechanisms on explainability on safety prediction. Finally, explainable trajectory prediction has also received some attention along the similar neural-attention methodology [151, 154].

However, due to the unreliability of attention-based explanations, it is interesting to look at methods which do not rely on attention weights. Liu et al. [149] used a post-hoc heatmap to infer different potential goals on a map, which then guides a neural network-based planner to capture planning uncertainties. Additionally, Wang et al. [150] combined bi-directional long short-term memory with a conditional random field (CRF) predictor to provide scores for interpretable hand-crafted features in LC scenarios. Their model also enforces interpretable hard and soft rules that the system must satisfy. However, their evaluation is limited and no qualitative discussion is given of how the CRF improves the interpretability of the system as a whole.

IV-D3 Auxiliary Explanations – End-to-end

Kim et al. [155] proposed the generation of textual explanations for E2E driving tasks. They introduced a dataset called BDD-X (Berkeley DeepDrive eXplanation) with driving videos annotated with driving descriptions and action explanations. In addition to the E2E control system, a second attention-based model was trained to predict textual explanations from video sequences. The attention maps of both models were aligned to create a dependency between the controller and the explanations. Based on that, Kühn et al. [156] evaluated the developed baseline on a new dataset called SAX [162] and proposed some improvements over the baseline. They utilized video frames as input and generated natural language action descriptions and explanations using an opaque neural network. Building on this architecture, Mori et al. [163] incorporated throttle into the control in addition to steering and developed an attention map for visual explanations of AV decisions. Xu et al. [103] introduced the dataset BDD-OIA (object-induced actions) which extracted complicated scenarios from BDD-X and annotated them with new explanations focusing on objects which influence the decision. Additionally, they proposed a DNN architecture which jointly learns action prediction and textual generation. Dong et al. [157] extended the approach by introducing a transformer architecture for the E2E network. In this way, the decision and reason generator could include the feature extractor and the attention zones of the transformer architecture. For the decision and reason generator task, Zhang et al. [158] introduced an additional interrelation module in the network expressing interrelationships among the ego vehicle and other traffic-related objects. This module is then combined with global features of the E2E network to provide more reliable actions and explanations.

Feng et al. [159] proposed to expand the textual reasoning about the driving actions with explanations including the surrounding environment based on semantic segmentation by extending the BDD-OIA dataset with additional annotations, although they did not qualitatively show the added benefit of the new annotations. In [160], Zhang et al. extended the BDD dataset by BDD-3AA by providing explanations and corresponding object segmentations. The interpretation was provided by importance value scores for the objects on the image. Human evaluation showed that object-level explanations are more persuasive than pixel-level explanations while the additional textual explanations increased trust for users and manufacturers. However, the decisions and explanations do not necessarily correlate, and the explanations need to be validated for reliability.

Wang et al. [164] proposed intermediate outputs in the E2E design to improve interoperability. Besides the planned trajectory as an output, they also provide future semantic maps from the intermediate perception part in Birds-Eye-View (BEV). A similar approach was proposed by Chen et al. [165] where a semantic BEV mask containing a map, ego state, surrounding objects and routing was delivered. Yang et al. [166] proposed two frameworks generating attention maps of E2E controllers to better understand scenes. The first one was model-specific and produced feature maps from the convolutional layer. In contrast, a second model-agnostic approach was proposed which compared the controller outputs between the raw input images and occluded ones. By examining changes in the output, a pixel-wise heat map was created.

Cultrera et al. [167] proposed attention blocks in the DNN-based E2E controller to create attention maps. Aksoy and Yazici [169] developed an E2E controller which explicitly provided a saliency map prediction as an intermediate output and as an input for the action prediction. Chitta et al. [172] proposed an E2E system which provides a trajectory and a BEV semantic prediction as an output. Moreover, attention maps of the DNN are computed to increase interpretability. Similarly, Sadat et al. [173] introduced an E2E motion planner that provides semantic occupancy forecasting as an interpretable intermediate representation resulting from the perception and prediction task. The intermediate output consists of a semantic occupancy map including, motion predictions of different agents. Moreover, the cost function of the motion planner takes occupancy forecasting as an additional input to increase safety of the generated trajectories.

Wei et al. [174] trained an E2E method that directly plans the future trajectory for the ego vehicle. Their method includes an attention mask over a CNN backbone that they claim can increase the safety and interpretability of the system by allowing the inspection of the LiDAR input data. However, their evaluation does not analyse the benefits of this system. Tashiro et al. [175] also produced heat maps as an intermediate output for an E2E controller. For the heat map generation, they quantised the network activations to pay limited attention to specific bits and showed improved performance to other attention map generation methods. In addition, the visual intermediate outputs lead to a similar transparency that modular AD architectures can provide. This could also help identify errors in complex E2E systems more accurately. However, the reliability of the intermediate output is not guaranteed and the intermediate explanations do not necessarily help in understanding the behaviour of the E2E system.

Teng et al. [177] leveraged a Bird’s Eye View (BEV) mask, which provided scene semantic information. They argued that the BEV mask can demonstrate how an AV understands the scenarios and thus promote interoperability.

IV-D4 Auxiliary Explanations – Summary

A prominent approach in perception is generating heat maps, which visually explain the regions in the input that the black box algorithm has focused on. The heat maps can be utilized for local post-hoc explanations. However, heat maps can be fragile and unreliable. Moreover, it is difficult to evaluate the correctness of the provided explanation [40]. Planning and prediction algorithms that generate auxiliary explanations all rely on an attention mechanism either as part of the transformer architecture or in conjunction with a recurrent neural network (RNN). Attention weights are then manually interpreted giving some insight into how the algorithm transformed the input into a decision.

The benefit of attention-based methods is that their results are interpretable through the analysis of the attention weights, but the proposed systems are highly model-specific and require detailed knowledge of the underlying architecture of the neural network. In addition, attention weights are widely known to be inconsistent and difficult to interpret as explanations [43, 180]. The major problem here is that attention weights may provide a “plausible” – i.e., intuitively correct – explanation of the decision-making algorithm despite not being “faithful”, i.e., factually correct. This may then wrongly calibrate people’s trust in the system leading to over- or under-reliance.

For E2E learning, there are various possibilities to provide auxiliary explanations. As in perception, heat maps can be provided which highlight important regions for the algorithm’s action in the input. These visual explanations can be expanded by textual explanations, that generate reasons based on the input for the chosen action. Lastly, one can also provide intermediate outputs which visualise the perception or prediction part inside the E2E network. This gives a better insight into the internal workings of the E2E network, but the intermediate outputs cannot directly explain the network’s decisions.

IV-E Interpretable Safety Validation

Definition IV.5 (Interpretable Safety Validation).

We say that an algorithm provides interpretable safety validation if it uses an interpretable algorithm to generate adversarial behaviours of other traffic participants for the validation of an AV.

IV-E1 Interpretable Safety Validation – Summary

These interpretable methods focus on supporting safety assurance via post-hoc explainability by either generating failure cases or by extracting accident scenarios to be used in safety assessments. Safety validation also differs from the standard perception, planning, and other runtime functions as it is executed during the verification and validation phase (offline). For reinforcement learning, temporal logic can be inserted into policies to ensure safe behaviour. For critical scenarios in prediction and perception, heat maps can be further analyzed to extract and interpret critical factors in the corresponding scenarios.

Corso and Kochenderfer [181] utilized signal temporal logic (STL) to generate high-likelihood failures for AVs, while they argued STL is easily understood because of its logical description between temporal variables. DeCastro et al. [182] leveraged parametric STL (pSTL) to construct an interpretable view on modeling a relationship between policy parameters to the emergent behaviours from deploying that policy, while the behaviour outcome is expressed by pSTL formulas. As pSTL provides a way to construct formulas that describe the relationships between spatial and temporal properties of a signal, the formally-specifiable outcome can be obtained by configuring the parameters, allowing proactively generating various desired behaviour of an agent for testing AVs.

Kang et al. [183] proposed a visual transformer to predict collisions supplemented by attention maps. Subsequently, a time series of attention maps is further analysed to identify spatiotemporal characteristics and based on the situation interpretation, accident scenarios for safety assessment are extracted. The extraction is based on the definition of functional scenarios by the PEGASUS project [184] on 6-layer information including road levels, traffic infrastructure, events, objects, environments, and digital information.

In [185], Li et al. introduced a risk assessment phase for the perception and prediction of dangerous vehicles as well as traffic lights. A visual explanation for the classification is provided by computing saliency maps via RISE algorithm [186], which supports safety assurance in the risk assessment phase. Shao et al. [187] also output the intermediate interpretable features for semantic explanation, aiming to enhance safety for the downstream controller.

V XAI Framework for AD

We now provide an overview of existing XAI frameworks for AD and analyze their limitations. As part of our systematic review, we identified three relevant XAI frameworks, which illustrate high-level AD modules and describe various ways to integrate them. Subsequently, we propose our XAI framework for AD – SafeX: a framework for safe and explainable AD – based on the concrete XAI methods summarized in Section IV.

V-A Existing XAI frameworks

Omeiza et al. [17] defined an explainer as the bridge between an AV and users, allowing explanations to users’ queries based on the information from AD modules, as shown in Figure 7. Instead of focusing on a specific AV function, their framework remains at a high level to illustrate the general role of XAI in AD. Atakishiyev et al. [18] introduced a similar conceptual framework for E2E autonomous control systems by including XAI components that realise safety-regulatory compliance. In this framework, an XAI component aims to provide explanations of each driving action taken in the given environment. Regulatory compliance is confirmed by simulation and real-world testing based on these explanations.

The framework defined by Brajovic et al. [188] consists of four steps for the entire development cycle of AI. These are use case definition, data collection, model development, and model operation. The use case describes the task that the AI aims to solve, while the data affects whether the AI is biased and robust. The developed model is aimed to achieve an appropriate level of accuracy, robustness, explainability, and other desirable requirements. Finally, the model operation shall be equipped with a monitoring system that is proportionate to the nature of the AI and its associated risks. Although this framework provides useful guidance, its application to AVs is not addressed and users’ queries are not considered.

V-B Conceptual Framework: SafeX

We propose a novel conceptual framework for safe and explainable AD shown in Figure 8, which we call SafeX. Different from the frameworks proposed in previous work, we present a more fine-grained application of XAI to AD, focusing on the integration of the concrete surveyed methods within the full AD stack in a way that also enables safety monitoring and intelligible explanation delivery.

The overall structure of SafeX is shown in Figure 8a. We define an explainable monitoring system (EMS) as a bridge between users and an AV. On one hand, the EMS generates intelligible explanations to users based on their queries by extracting the necessary information from the AV. On the other hand, it includes a monitor for each AD module to deliver safety feedback regarding the module’s output. These two functions of the EMS are not only aimed at increasing a user’s understanding and trust in the AV but also at providing a safer AV for the user. To accomplish the two roles of the EMS, each AD module must be carefully designed. Figure 8b uses the four identified XAI categories from our survey to deliver explanatory and monitoring information to the upstream EMS for each AD module. For black boxes in an AD module, interpretable monitors, interpretable surrogate models, and auxiliary explanations can be applied. In addition, the functions in the module can also be inherently interpretable to deliver traceable explanatory information if the interpretable functions meet the performance requirements. They may also serve as a fallback if the monitoring systems report unexpected and unverifiable behaviour from the black box systems.

In contrast to existing frameworks, SafeX is based on concrete state-of-the-art methods, and we design SafeX according to the modular components identified in Section IV. We display two variants of how SafeX could be realized.

Variant 1: all three units in Figure 8a are deep learning-based black box modules. For camera-based perception, heat maps can be generated as an auxiliary explanation [143], highlighting pixels in the camera image that are relevant to the black box model’s prediction. Moreover, a random forest can be applied to identify perception errors based on meta-information from the environment [93]. With the corresponding Shapley values, an interpretable surrogate model can be applied. Additionally, the concept-bottleneck model for pedestrian detection can be used for the EMS as proposed in [111]. The perception monitor can verify the predictions of the object detector for the safety-relevant object class pedestrians. Regarding deep learning-based motion planning algorithms, they take the environmental representation provided by the object detector as an input and provide a planned trajectory of the AV as an output. Similarly to the perception unit, a heat map highlighting different potential goals on the map [149] can be generated as an auxiliary explanation. As a surrogate model, the cognitive model introduced by Gyevnar et. al. [98] based on the planning and prediction model [67] can be utilized to provide causal explanations for decision-making. For the EMS, an inherently interpretable decision tree can be trained to verify the decisions of the black box motion planner, as proposed in [127]. Lastly, a DNN-based controller can be applied, which can be combined with cross-comparable clustering as a surrogate to aid in interpreting the control signals.

Variant 2: AD modules are inherently interpretable. The perception module could be trained as a concept-bottleneck model [59]. Extracting semantic concepts in the segmented environment makes the algorithm more interpretable and reliable. For the subsequent planning unit, a Monte Carlo Tree Search over high-level driving maneuvres can be applied [67]. The resulting search tree helps in interpreting the planner’s decision. Lastly, a neural network-based controller with an interpretable projection mechanism [88] can be used as an explainable control unit in the framework.

The resulting modular design allows future research and development to focus on deeply investigating and refining specific components independently. By stacking multiple forms of XAI methods, we can enable developers to integrate the appropriate methods with their AD stack based on specific stakeholder and regulatory requirements, and the desired degree of safety. Moreover, the proposed EMS can simultaneously achieve both the safety monitoring of the AV and the delivery of intelligible explanations to users’ queries.

VI Discussion

We set out to answer two research questions based on a systematic literature review. In closely scrutinising the retrieved publications for RQ1, we found that state-of-the-art literature is trying to resolve the challenge of safe and trustworthy AI in AD by focusing on five core XAI design paradigms, namely interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation.

It is interesting to note that there is a significant imbalance in the number of publications among each of the driving tasks of perception, planning and prediction, and control. Control is consistently more neglected across all five XAI design paradigms than perception and planning, despite the intensive research into neural network-based safe RL control methods [189]. Furthermore, XAI for LiDAR-based perception and various fusion approaches remains highly unexplored compared to camera-based detectors. This is noteworthy even though the majority of state-of-the-art perception architectures incorporate LiDAR sensors due to their provision of accurate depth information [25]. In contrast, E2E methods enjoy significant attention from the field, however, most methods for these systems are constrained to auxiliary explanations.

However, herein lies an important challenge. It has been shown many times, that the post-hoc analysis methods of auxiliary explanations based on Shapley values, attention maps, or saliency maps are neither consistent nor necessarily correct (see for some examples [43, 44, 190]). While these methods are undoubtedly useful for building explanations, they are also not sufficient, if our requirements of trustworthy and safe AI are to be upheld in, for example, regulations and courts. This challenge is then further exacerbated by the fact that the evaluation of auxiliary methods is usually cursory with hard-to-interpret quantitative metrics and no qualitative insights at all. One way to increase safety for AD is to integrate multiple XAI techniques into one framework in a “Swiss cheese” model of safety that assures that malfunctions do not go unnoticed through the AD stack.

This is why our analysis of RQ2 is relevant, and why we propose a new framework called SafeX to integrate concrete XAI methods with AD. We found that the number of existing works about frameworks or pipelines is limited and these provide only a very high-level overview of the ways XAI may be integrated with AD on a lower level of the AD stack. Given these limitations and the urgent need for safe and trustworthy AI for the AD stack, our framework SafeX modularly integrates the identified techniques of XAI with each AD module. A modular approach in SafeX allows the combination of multiple sources of explanations in a way that may reduce the risks of using AI for AD. One may also combine multiple modalities of predictions which, when used with our proposed explainable monitoring system, can act both as a bridge between users and the AD system and as a tool for comprehensive safety guarantees. The EMS is, thus, designed to enable the delivery of explanations to users while ensuring the safety of each AD module through runtime monitoring.

We also observed that interpretable safety validation, one of the five XAI design paradigms, has received less attention in the field. This is relevant because the safety testing of AVs is one of the most pertinent and difficult challenges that currently faces the AD field due to the heavy-tailed distribution of driving scenarios [191]. As we saw in Section IV-E, one way to mitigate this problem is the extraction of varied scenarios from real driving data that is achieved through an interpretability analysis uncovering the relevant factors of the environment in the scenario. Through interpretability, we can also understand the causal factors in the scenarios so that we can manipulate them and extract new scenarios.

In our study, we narrowed our focus on perception, planning and prediction, and control, while not considering studies about data diversity, ethics, or AI model oversight. This is because the former three are arguably the most pressing if we aim to address the requirements of safe and trustworthy AI in a way that also translates to more deployable and reliable AVs. While the latter three are undoubtedly important, their solution may present less of a stride towards creating real-world AVs.

In addition, natural language understanding and generation for interacting with users and delivering intelligible explanations were not considered, though our review has picked up on a few methods [17, 98, 83] that directly consider human-robot interactions as a significant part of the explanatory process. What this suggests, is that there exists a disconnect between research that focuses on the needs of end users and research that addresses explainability of the driving stack. The problem with this gap is that explanations ought to change depending on the requirements of the user and the design of explanations need to take this dependency into account otherwise risking invoking mistrust or confusion in users. This necessitates the study of evaluating methods for explainability with humans as actual stakeholders, for example, as summarized by Vilone and Longo [192].

Furthermore, our focus on XAI is only a partial measure of how safe and trustworthy AI should be achieved. As discussed in Section II-A, trustworthiness, safety, and transparency are overarching concepts that require the cross-disciplinary collaboration of people. Other measures such as uncertainty quantification, rigorous testing, thorough documentation, standardisation, etc. are also necessary. Still, we have also seen that XAI is a diverse and popular field that addresses some of the key requirements of trustworthy and safe AI.

Finally, it is worth noting that generative methods did not appear prominently in our review, even though our systematic search did not exclude these papers. On one hand, this is not surprising because generative methods further exacerbate the issues around black-box decision-making algorithms such that existing XAI algorithms cannot be applied to them. On the other hand, generative methods, especially multi-modal methods which combine sensing, control, and language (e.g., [193]), have the potential to self-explain their decisions. In addition, the issue of the long-tail disttribution of critical scenarios may also be aleviated by generative methods, though their efficacy at this is yet to be verified. At the very least, generative modelling is an area that requires further investigation and may be a promising direction for future research.

To summarise, we identify the following challenges and recommendations for the field of XAI for AD:

•

Explainable perception architecture: investigate more explainable approaches for other sensors such as LiDAR and Radar not just camera-based perception; explore XAI for various fusion architectures, particularly combining XAI methods for different sensors that are integrated;
•

Rigorous testing for auxiliary methods: auxiliary explanations methods like Shapley values, saliency and attention maps are prone to gaming, inaccuracies, and misinterpretation. It is necessary to thoroughly evaluate these methods not just quantitatively but also with extensive qualitative insights that focus especially on the failure cases of the methods;
•

Modular and layered monitoring: to improve the safety of AD, one method does not suffice. Our proposed framework, SafeX, instead suggests that multiple layers of independent and co-supervisory explanatory functions should verify and monitor the workings of underlying black box systems and each other, potentially providing fallback options in emergencies;
•

Cross-disciplinary collaboration: XAI methods are usually developed in isolation. To better understand stakeholder requirements and to adapt explanations to the varied socio-technical interactions of the real world, it is crucial to develop methods that are rooted in actual problems and not merely motivated by a vague sense of need for safety and trustworthiness.
•

Generative methods: generative methods are a promising direction for future research in XAI for AD. They have the potential to self-explain their decisions and may be able to alleviate the issue of the long-tail distribution of critical scenarios. However, their efficacy at this is yet to be verified.

VII Conclusion

In this paper, we investigated the applications of XAI for safe and trustworthy AD. We began the survey by defining requirements for trustworthy AI in AD, noting that XAI is a promising field for addressing several of these requirements. Subsequently, we gave an overview of the sources of explanations in AI and presented the taxonomy of XAI. Based on a systematic literature survey founded on two research questions, we derived five key applications of XAI for safe and trustworthy AI in AD and an appropriate framework to integrate these applications into AD. Our key findings are:

•

Actual XAI for AD research can be sorted into five categories: interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation;
•

There is a lack of detailed general XAI for AD frameworks that address safety requirements and are also rooted in concrete research. We propose to fill this gap with a new framework SafeX that can incorporate all categories of XAI methods designed for AD;
•

XAI for AD, as an emerging topic, is gaining increasing attention according to the published literature per year. We expect that the number of studies will further increase with the development of AI.

Looking to the future, we can expect legal and social pressures to increase on the development of AD. Growing up to this challenge will require joined initiatives from multiple disciplines and the involvement of various stakeholders. Here, we expect XAI to act as a bridge that could connect cross-disciplinary gaps. Emerging fields will also continue to influence the field. With the advent of large language model-based (LLM) systems, there will be a pronounced need for XAI more than ever, as models continue to improve and emergent behaviour is discovered every day. Calls for this in other fields are already emerging (e.g., mechanistic interpretability [194]), however, the use of LLMs in AD further complicates the black box problem. In addition, LLMs themselves could one day become the explainers, but it will only be through the involvement of various stakeholders and disciplines that this may become a reality for safe and trustworthy AD.

References

[1] A. Mathew, P. Amudha, and S. Sivakumari, “Deep learning techniques: an overview,” Advanced Machine Learning Technologies and Applications: Proceedings of AMLTA 2020, pp. 599–608, 2021.
[2] A. A. Jammal, A. C. Thompson, E. B. Mariottoni, S. I. Berchuck, C. N. Urata, T. Estrela, S. M. Wakil, V. P. Costa, and F. A. Medeiros, “Human versus machine: comparing a deep learning algorithm to human gradings for detecting glaucoma on fundus photographs,” American journal of ophthalmology, vol. 211, pp. 123–131, 2020.
[3] O. Willers, S. Sudholt, S. Raafatnia, and S. Abrecht, “Safety concerns and mitigation approaches regarding the use of deep learning in safety-critical perception tasks,” in International Conference on Computer Safety, Reliability, and Security, 2020, pp. 336–350.
[4] ISO, “ISO 26262-1:2018(en), Road vehicles — Functional safety,” 2018. [Online]. Available: https://www.iso.org/standard/43464.html
[5] R. Salay and K. Czarnecki, “Using machine learning safely in automotive software: An assessment and adaption of software process requirements in iso 26262,” arXiv preprint arXiv:1808.01614, 2018.
[6] ISO, “ISO 21448:2022: Road vehicles—Safety of the intended functionality,” 2022. [Online]. Available: https://www.iso.org/standard/77490.html
[7] S. Burton, C. Hellert, F. Hüger, M. Mock, and A. Rohatschek, “Safety assurance of machine learning for perception functions,” in Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety. Springer International Publishing Cham, 2022, pp. 335–358.
[8] D. Gesmann-Nuissl and I. Tacke, “Funktionale sicherheit ki-basierter systeme im automobilsektor,” in the 14th Workshop Fahrerassistenz und automatisiertes Fahren, 2022, pp. 85–98.
[9] ISO, “ISO/CD PAS 8800:road vehicles - safety and artificial intelligence,” 2023. [Online]. Available: https://www.iso.org/standard/83303.html
[10] B. Moye. (2023) Aaa: Fear of self-driving cars on the rise. [Online]. Available: https://newsroom.aaa.com/2023/03/aaa-fear-of-self-driving-cars-on-the-rise/
[11] S. Reig, S. Norman, C. G. Morales, S. Das, A. Steinfeld, and J. Forlizzi, “A field study of pedestrians and autonomous vehicles,” in Proceedings of the 10th international conference on automotive user interfaces and interactive vehicular applications, 2018, pp. 198–209.
[12] M. Langer, D. Oster, T. Speith, H. Hermanns, L. Kästner, E. Schmidt, A. Sesing, and K. Baum, “What do we want from explainable artificial intelligence (xai)?–a stakeholder perspective on xai and a conceptual model guiding interdisciplinary xai research,” Artificial Intelligence, vol. 296, p. 103473, 2021.
[13] R. Dwivedi, D. Dave, H. Naik, S. Singhal, R. Omer, P. Patel, B. Qian, Z. Wen, T. Shah, G. Morgan, and R. Ranjan, “Explainable ai (xai): Core ideas, techniques, and solutions,” ACM Computing Surveys, vol. 55, no. 9, pp. 1–33, 2023.
[14] K. Weitz, D. Schiller, R. Schlagowski, T. Huber, and E. André, “” do you trust me?” increasing user-trust by integrating virtual agents in explainable ai interaction design,” in Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents, 2019, pp. 7–9.
[15] A. Deeks, “The judicial demand for explainable artificial intelligence,” Columbia Law Review, vol. 119, no. 7, pp. 1829–1850, 2019.
[16] K. Muhammad, A. Ullah, J. Lloret, J. D. Ser, and V. H. C. de Albuquerque, “Deep learning for safe autonomous driving: Current challenges and future directions,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 7, pp. 4316–4336, 2021.
[17] D. Omeiza, H. Webb, M. Jirotka, and L. Kunze, “Explanations in autonomous driving: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 10 142–10 162, 2021.
[18] S. Atakishiyev, M. Salameh, H. Yao, and R. Goebel, “Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions,” arXiv preprint arXiv:2112.11561, 2021.
[19] É. Zablocki, H. Ben-Younes, P. Pérez, and M. Cord, “Explainability of deep vision-based autonomous driving systems: Review and challenges,” International Journal of Computer Vision, vol. 130, no. 10, pp. 2425–2452, 2022.
[20] B. G. Buchanan and R. G. Smith, “Fundamentals of expert systems,” in Annual Review of Computer Science: Vol. 3, 1988. USA: Annual Reviews Inc., Sep. 1988, pp. 23–58.
[21] C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, May 2019.
[22] D. Kaur, S. Uslu, K. J. Rittichier, and A. Durresi, “Trustworthy Artificial Intelligence: A Review,” ACM Computing Surveys, vol. 55, no. 2, pp. 39:1–39:38, Jan. 2022.
[23] N. Burkart and M. F. Huber, “A Survey on the Explainability of Supervised Machine Learning,” Journal of Artificial Intelligence Research, vol. 70, pp. 245–317, May 2021.
[24] B. Gyevnar, N. Ferguson, and B. Schafer, “Bridging the Transparency Gap: What Can Explainable AI Learn From the AI Act?” in Proceedings of the 26th European Conference on Artificial Intelligence ECAI 2023, Krakow, Poland, Oct. 2023.
[25] D. Feng, A. Harakeh, S. L. Waslander, and K. Dietmayer, “A review and comparative study on probabilistic object detection in autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 9961–9980, 2021.
[26] European Comission. (2019) Ethics guidelines for trustworthy ai. [Online]. Available: https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai
[27] NIST AI. (2023) Artificial intelligence risk management framework (ai rmf 1.0). [Online]. Available: https://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook
[28] L. Alzubaidi, A. Al-Sabaawi, J. Bai, A. Dukhan, A. H. Alkenani, A. Al-Asadi, H. A. Alwzwazy, M. Manoufali, M. A. Fadhel, A. Albahri et al., “Towards risk-free trustworthy artificial intelligence: Significance and requirements,” International Journal of Intelligent Systems, vol. 2023, 2023.
[29] J. Andraško, O. Hamul’ák, M. Mesarčík, T. Kerikmäe, and A. Kajander, “Sustainable data governance for cooperative, connected and automated mobility in the european union,” Sustainability, vol. 13, no. 19, p. 10610, 2021.
[30] E. Parliament, “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons With Regard to the Processing of Personal Data and on the Free Movement of Such Data and Repealing Directive 95/46/EC (General Data Protection Regulation),” 2016. [Online]. Available: https://data.europa.eu/eli/reg/2016/679/oj
[31] X. Li, Z. Chen, J. M. Zhang, F. Sarro, Y. Zhang, and X. Liu, “Dark-skin individuals are at more risk on the street: Unmasking fairness issues of autonomous driving systems,” arXiv preprint arXiv:2308.02935, 2023.
[32] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, and D. Song, “Robust physical-world attacks on deep learning visual classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1625–1634.
[33] X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 9, pp. 2805–2824, 2019.
[34] SAE J3016, “Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles,” Apr. 2021. [Online]. Available: https://doi.org/10.4271/J3016_202104
[35] A. M. Nascimento, L. F. Vismari, C. B. S. T. Molina, P. S. Cugnasca, J. B. Camargo, J. R. d. Almeida, R. Inam, E. Fersman, M. V. Marquezini, and A. Y. Hata, “A systematic literature review about the impact of artificial intelligence on autonomous vehicle safety,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 12, pp. 4928–4946, 2020.
[36] Q. A. Ribeiro, M. Ribeiro, and J. Castro, “Requirements engineering for autonomous vehicles: a systematic literature review,” in Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, 2022, pp. 1299–1308.
[37] R. Sheh, “Explainable artificial intelligence requirements for safe, intelligent robots,” in 2021 IEEE International Conference on Intelligence and Safety for Robotics (ISR), 2021, pp. 382–387.
[38] P. P. Angelov, E. A. Soares, R. Jiang, N. I. Arnold, and P. M. Atkinson, “Explainable artificial intelligence: an analytical review,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 11, no. 5, p. e1424, 2021.
[39] T. Miller, “Explanation in artificial intelligence: Insights from the social sciences,” Artificial intelligence, vol. 267, pp. 1–38, 2019.
[40] C. Molnar, Interpretable Machine Learning. Christoph Molnar, 2023. [Online]. Available: https://christophm.github.io/interpretable-ml-book/
[41] G. Schwalbe and B. Finzel, “A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts,” Data Mining and Knowledge Discovery, pp. 1–59, 2023.
[42] T. Speith, “A review of taxonomies of explainable artificial intelligence (xai) methods,” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 2239–2250.
[43] S. Jain and B. C. Wallace, “Attention is not explanation,” arXiv preprint arXiv:1902.10186, 2019.
[44] I. E. Kumar, S. Venkatasubramanian, C. Scheidegger, and S. Friedler, “Problems with Shapley-value-based explanations as feature importance measures,” in Proceedings of the 37th International Conference on Machine Learning. PMLR, Nov. 2020, pp. 5491–5500.
[45] S. D. Pendleton, H. Andersen, X. Du, X. Shen, M. Meghjani, Y. H. Eng, D. Rus, and M. H. Ang Jr, “Perception, planning, control, and coordination for autonomous vehicles,” Machines, vol. 5, no. 1, p. 6, 2017.
[46] J. Van Brummelen, M. O’Brien, D. Gruyer, and H. Najjaran, “Autonomous vehicle perception: The technology of today and tomorrow,” Transportation research part C: emerging technologies, vol. 89, pp. 384–406, 2018.
[47] C. E. Tuncali, G. Fainekos, D. Prokhorov, H. Ito, and J. Kapinski, “Requirements-driven test generation for autonomous vehicles with machine learning components,” IEEE Transactions on Intelligent Vehicles, vol. 5, no. 2, pp. 265–280, 2019.
[48] B. Paden, M. Čáp, S. Z. Yong, D. Yershov, and E. Frazzoli, “A survey of motion planning and control techniques for self-driving urban vehicles,” IEEE Transactions on intelligent vehicles, vol. 1, no. 1, pp. 33–55, 2016.
[49] C. Altafini, “Following a path of varying curvature as an output regulation problem,” IEEE Transactions on Automatic Control, vol. 47, no. 9, pp. 1551–1556, 2002.
[50] E. Frazzoli, M. A. Dahleh, and E. Feron, “Trajectory tracking control design for autonomous helicopters using a backstepping algorithm,” in Proceedings of the 2000 American Control Conference. ACC (IEEE Cat. No. 00CH36334), vol. 6. IEEE, 2000, pp. 4102–4107.
[51] L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and frontiers,” arXiv preprint arXiv:2306.16927, 2023.
[52] A. Tampuu, T. Matiisen, M. Semikin, D. Fishman, and N. Muhammad, “A survey of end-to-end driving: Architectures and training methods,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 4, pp. 1364–1384, 2022.
[53] B. Kitchenham and S. Charters, “Guidelines for performing Systematic Literature Reviews in Software Engineering,” School of Computer Science and Mathematics, Keele University, Keele, UK, EBSE Technical Report EBSE-2007-01, Jul. 2007.
[54] I. Stepin, J. M. Alonso, A. Catala, and M. Pereira-Fariña, “A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence,” IEEE Access, vol. 9, pp. 11 974–12 001, 2021.
[55] M. Du, N. Liu, and X. Hu, “Techniques for interpretable machine learning,” Communications of the ACM, vol. 63, no. 1, p. 68–77, Dec. 2019. [Online]. Available: http://dx.doi.org/10.1145/3359786
[56] Z. Chaghazardi, S. Fallah, and A. Tamaddoni-Nezhad, “Explainable and trustworthy traffic sign detection for safe autonomous driving: An inductive logic programming approach,” Electronic Proceedings in Theoretical Computer Science, vol. 385, pp. 201–212, 08 2023.
[57] P. Feifel, F. Bonarens, and F. Köster, “Reevaluating the safety impact of inherent interpretability on deep neural networks for pedestrian detection,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 29–37.
[58] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[59] M. Losch, M. Fritz, and B. Schiele, “Semantic bottlenecks: Quantifying and improving inspectability of deep representations,” International Journal of Computer Vision, vol. 129, pp. 3136–3153, 2021.
[60] A. Plebe and M. D. Lio, “On the road with 16 neurons: Towards interpretable and manipulable latent representations for visual predictions in driving scenarios,” IEEE Access, vol. 8, pp. 179 716–179 734, 2020.
[61] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3234–3243.
[62] A. Oltramari, J. Francis, C. Henson, K. Ma, and R. Wickramarachchi, “Neuro-symbolic architectures for context understanding,” in Knowledge Graphs for eXplainable Artificial Intelligence: Foundations, Applications and Challenges. IOS Press, 2020, pp. 143–160.
[63] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” arXiv preprint arXiv:1903.11027, 2019.
[64] J. Martínez-Cebrián, M.-A. Fernández-Torres, and F. Díaz-De-María, “Interpretable global-local dynamics for the prediction of eye fixations in autonomous driving scenarios,” IEEE Access, vol. 8, pp. 217 068–217 085, 2020.
[65] A. Palazzi, D. Abati, s. Calderara, F. Solera, and R. Cucchiara, “Predicting the driver’s focus of attention: The dr(eye)ve project,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1720–1733, 2019.
[66] K. Yoneda, N. Ichihara, H. Kawanishi, T. Okuno, L. Cao, and N. Suganuma, “Sun-glare region recognition using visual explanations for traffic light detection,” in 2021 IEEE Intelligent Vehicles Symposium (IV), 2021, pp. 1464–1469.
[67] S. V. Albrecht, C. Brewitt, J. Wilhelm, B. Gyevnar, F. Eiras, M. Dobre, and S. Ramamoorthy, “Interpretable Goal-based Prediction and Planning for Autonomous Driving,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), May 2021, pp. 1043–1049.
[68] J. P. Hanna, A. Rahman, E. Fosong, F. Eiras, M. Dobre, J. Redford, S. Ramamoorthy, and S. V. Albrecht, “Interpretable Goal Recognition in the Presence of Occluded Factors for Autonomous Vehicles,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2021, pp. 7044–7051.
[69] M. Antonello, M. Dobre, S. V. Albrecht, J. Redford, and S. Ramamoorthy, “Flash: Fast and light motion prediction for autonomous driving with Bayesian inverse planning and learned motion profiles,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 9829–9836.
[70] U.S. Department Of Transportation Federal Highway Administration, “Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data,” 2017.
[71] C. Brewitt, B. Gyevnar, S. Garcin, and S. V. Albrecht, “GRIT: fast, interpretable, and verifiable goal recognition with learned decision trees for autonomous driving,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.
[72] C. Brewitt, M. Tamborski, C. Wang, and S. V. Albrecht, “Verifiable goal recognition for autonomous driving with occlusions,” IEEE/RSJ International Conference on Intelligent Robots and Systems, 2023.
[73] J. Bock, R. Krajewski, T. Moers, S. Runde, L. Vater, and L. Eckstein, “The ind dataset: A drone dataset of naturalistic road user trajectories at german intersections,” in 2020 IEEE Intelligent Vehicles Symposium (IV), 2020, pp. 1929–1934.
[74] R. Krajewski, T. Moers, J. Bock, L. Vater, and L. Eckstein, “The round dataset: A drone dataset of road user trajectories at roundabouts in germany,” in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), 2020, pp. 1–6.
[75] A. Breuer, J.-A. Termöhlen, S. Homoceanu, and T. Fingscheidt, “opendd: A large-scale roundabout drone dataset,” in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020, pp. 1–6.
[76] A. Ghoul, I. Yahiaoui, A. Verroust-Blondet, and F. Nashashibi, “Interpretable Goal-Based model for Vehicle Trajectory Prediction in Interactive Scenarios,” in 2023 IEEE Intelligent Vehicles Symposium (IV), Jun. 2023, pp. 1–6.
[77] W. Zhan, L. Sun, D. Wang, H. Shi, A. Clausse, M. Naumann, J. Kümmerle, H. Königshof, C. Stiller, A. de La Fortelle, and M. Tomizuka, “INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps,” arXiv:1910.03088 [cs, eess], Sep. 2019.
[78] B. Gyevnar, M. Tamborski, C. Wang, C. G. Lucas, S. B. Cohen, and S. V. Albrecht, “A human-centric method for generating causal explanations in natural language for autonomous vehicle motion planning,” in IJCAI Workshop on Artificial Intelligence for Autonomous Driving, 2022.
[79] D. Omeiza, S. Anjomshoae, H. Webb, M. Jirotka, and L. Kunze, “From spoken thoughts to automated driving commentary: Predicting and explaining intelligent vehicles’ actions,” in 2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2022, pp. 1040–1047.
[80] F. Henze, D. Faßbender, and C. Stiller, “How Can Automated Vehicles Explain Their Driving Decisions? Generating Clarifying Summaries Automatically,” in 2022 IEEE Intelligent Vehicles Symposium (IV), Jun. 2022, pp. 935–942.
[81] K. Klein, O. De Candido, and W. Utschick, “Interpretable Classifiers Based on Time-Series Motifs for Lane Change Prediction,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 7, pp. 3954–3961, Jul. 2023.
[82] R. Krajewski, J. Bock, L. Kloeker, and L. Eckstein, “The highd dataset: A drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC), 2018, pp. 2118–2125.
[83] R. Kridalukmana, H. Lu, and M. Naderpour, “Self-Explaining Abilities of an Intelligent Agent for Transparency in a Collaborative Driving Context,” IEEE Transactions on Human-Machine Systems, vol. 52, no. 6, pp. 1155–1165, Dec. 2022.
[84] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16.
[85] N. Muscholl, M. Klusch, P. Gebhard, and T. Schneeberger, “EMIDAS: Explainable social interaction-based pedestrian intention detection across street,” in Proceedings of the ACM Symposium on Applied Computing, 2021, pp. 107–115.
[86] M. Neumeier, M. Botsch, A. Tollkühn, and T. Berberich, “Variational Autoencoder-Based Vehicle Trajectory Prediction with an Interpretable Latent Space,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Sep. 2021, pp. 820–827.
[87] M. Wu, F. R. Yu, P. X. Liu, and Y. He, “A Hybrid Driving Decision-Making System Integrating Markov Logic Networks and Connectionist AI,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 3, pp. 3514–3527, Mar. 2023.
[88] H. Zheng, Z. Zang, S. Yang, and R. Mangharam, “Towards explainability in modular autonomous system software,” in 2023 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2023, pp. 1–8.
[89] M. E. Tipping and C. M. Bishop, “Probabilistic principal component analysis,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 61, no. 3, pp. 611–622, 1999.
[90] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
[91] G. E. Hinton, A. Krizhevsky, and S. D. Wang, “Transforming auto-encoders,” in Artificial Neural Networks and Machine Learning – ICANN 2011, T. Honkela, W. Duch, M. Girolami, and S. Kaski, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 44–51.
[92] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 618–626.
[93] T. Ponn, T. Kröger, and F. Diermeyer, “Identification and explanation of challenging conditions for camera-based object detection of automated vehicles,” Sensors, vol. 20, no. 13, p. 3699, 2020.
[94] Z. Cui, M. Li, Y. Huang, Y. Wang, and H. Chen, “An interpretation framework for autonomous vehicles decision-making via shap and rf,” in 2022 6th CAA International Conference on Vehicular Control and Intelligence (CVCI). IEEE, 2022, pp. 1–7.
[95] M. Li, Y. Wang, H. Sun, Z. Cui, Y. Huang, and H. Chen, “Explaining a Machine-Learning Lane Change Model With Maximum Entropy Shapley Values,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 6, pp. 3620–3628, 2023.
[96] Y. Ma, S. Song, L. Zhang, L. Xiong, and J. Chen, “Lane Change Analysis and Prediction Using Mean Impact Value Method and Logistic Regression Model,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Sep. 2021, pp. 1346–1352.
[97] A. Mishra, U. Soni, J. Huang, and C. Bryan, “Why? why not? when? visual explanations of agent behaviour in reinforcement learning,” in 2022 IEEE 15th Pacific Visualization Symposium (PacificVis). IEEE, 2022, pp. 111–120.
[98] B. Gyevnar, C. Wang, C. G. Lucas, S. B. Cohen, and S. V. Albrecht, “Causal explanations for sequential decision-making in multi-agent systems,” in International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2024.
[99] B. Gyevnar, S. Droop, T. Quillien, S. B. Cohen, N. R. Bramley, C. G. Lucas, and S. V. Albrecht, “People attribute purpose to autonomous vehicles when explaining their behavior,” 2024.
[100] P. M. Dassanayake, A. Anjum, A. K. Bashir, J. Bacon, R. Saleem, and W. Manning, “A deep learning based explainable control system for reconfigurable networks of edge devices,” IEEE Transactions on Network Science and Engineering, vol. 9, no. 1, pp. 7–19, 2022.
[101] M. Zemni, M. Chen, É. Zablocki, H. Ben-Younes, P. Pérez, and M. Cord, “Octet: Object-aware counterfactual explanations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15 062–15 071.
[102] F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2636–2645.
[103] Y. Xu, X. Yang, L. Gong, H.-C. Lin, T.-Y. Wu, Y. Li, and N. Vasconcelos, “Explainable object-induced action decision for autonomous vehicles,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9520–9529.
[104] W. Shi, G. Huang, S. Song, Z. Wang, T. Lin, and C. Wu, “Self-supervised discovering of interpretable features for reinforcement learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2712–2724, 2020.
[105] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, pp. 4765–4774, 2017.
[106] D. Fryer, I. Strümke, and H. Nguyen, “Shapley values for feature selection: The good, the bad, and the axioms,” IEEE Access, vol. 9, pp. 144 352–144 360, 2021.
[107] B. Bilodeau, N. Jaques, P. W. Koh, and B. Kim, “Impossibility theorems for feature attribution,” Proceedings of the National Academy of Sciences, vol. 121, no. 2, p. e2304406120, 2024.
[108] J. Kronenberger and A. Haselhoff, “Dependency decomposition and a reject option for explainable models,” arXiv preprint arXiv:2012.06523, 2020.
[109] S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel, “Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark,” in International Joint Conference on Neural Networks, no. 1288, 2013.
[110] L. Hacker and J. Seewig, “Insufficiency-driven dnn error detection in the context of sotif on traffic sign recognition use case,” IEEE Open Journal of Intelligent Transportation Systems, vol. 4, pp. 58–70, 2023.
[111] M. Keser, G. Schwalbe, A. Nowzad, and A. Knoll, “Interpretable model-agnostic plausibility verification for 2d object detectors using domain-invariant concept bottleneck models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3890–3899.
[112] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
[113] D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba, “Network dissection: Quantifying interpretability of deep visual representations,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6541–6549.
[114] J. Fritsch, T. Kuehnl, and A. Geiger, “A new performance measure and evaluation benchmark for road detection algorithms,” in International Conference on Intelligent Transportation Systems (ITSC), 2013.
[115] Y. Fang, H. Min, X. Wu, X. Lei, S. Chen, R. Teixeira, and X. Zhao, “Toward interpretability in fault diagnosis for autonomous vehicles: Interpretation of sensor data anomalies,” IEEE Sensors Journal, vol. 23, no. 5, pp. 5014–5027, 2023.
[116] W. Bao, Q. Yu, and Y. Kong, “DRIVE: Deep Reinforced Accident Anticipation with Visual Explanation,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2021, pp. 7599–7608.
[117] J. Fang, D. Yan, J. Qiao, J. Xue, and H. Yu, “Dada: Driver attention prediction in driving accident scenarios,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4959–4971, 2021.
[118] G. Chen, Y. Zhang, and X. Li, “Attention-based highway safety planner for autonomous driving via deep reinforcement learning,” IEEE Transactions on Vehicular Technology, 2023.
[119] C. Di Franco and N. Bezzo, “Interpretable run-time monitoring and replanning for safe autonomous systems operations,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2427–2434, 2020.
[120] C. Gall and N. Bezzo, “Gaussian process-based interpretable runtime adaptation for safe autonomous systems operations in unstructured environments,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 123–129.
[121] L. H. Gilpin, V. Penubarthi, and L. Kagal, “Explaining multimodal errors in autonomous vehicles,” in 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2021, pp. 1–10.
[122] J. Gorospe, S. Hasan, M. R. Islam, A. A. Gómez, S. Girs, and E. Uhlemann, “Analyzing Inter-Vehicle Collision Predictions during Emergency Braking with Automated Vehicles,” in 2023 19th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), Jun. 2023, pp. 411–418.
[123] M. Karim, Y. Li, and R. Qin, “Toward Explainable Artificial Intelligence for Early Anticipation of Traffic Accidents,” Transportation Research Record, vol. 2676, no. 6, pp. 743–755, 2022.
[124] W. Bao, Q. Yu, and Y. Kong, “Uncertainty-based traffic accident anticipation with spatio-temporal relational learning,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2682–2690.
[125] R. Nahata, D. Omeiza, R. Howard, and L. Kunze, “Assessing and explaining collision risk in dynamic environments for autonomous driving safety,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 2021, pp. 223–230.
[126] J. Houston, G. Zuidhof, L. Bergamini, Y. Ye, L. Chen, A. Jain, S. Omari, V. Iglovikov, and P. Ondruska, “One Thousand and One Hours: Self-driving Motion Prediction Dataset,” in Proceedings of the 2020 Conference on Robot Learning. PMLR, Oct. 2021, pp. 409–418.
[127] L. M. Schmidt, G. Kontes, A. Plinge, and C. Mutschler, “Can you trust your autonomous car? interpretable and verifiably safe reinforcement learning,” in 2021 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2021, pp. 171–178.
[128] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Cham: Springer International Publishing, 2014, pp. 818–833.
[129] H. Y. Yatbaz, M. Dianati, and R. Woodman, “Introspection of dnn-based perception functions in automated driving systems: State-of-the-art and open research challenges,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 2, pp. 1112–1130, 2024.
[130] S. Kolekar, S. Gite, B. Pradhan, and A. Alamri, “Explainable ai in scene understanding for autonomous vehicles in unstructured traffic environments on indian roads using the inception u-net model with grad-cam visualization,” Sensors, vol. 22, no. 24, p. 9677, 2022.
[131] G. Varma, A. Subramanian, A. Namboodiri, M. Chandraker, and C. Jawahar, “Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1743–1751.
[132] V. S. Saravanarajan, R.-C. Chen, C.-H. Hsieh, and L.-S. Chen, “Improving semantic segmentation under hazy weather for autonomous vehicles using explainable artificial intelligence and adaptive dehazing approach,” IEEE Access, vol. 11, pp. 38 194–38 207, 2023.
[133] G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-definition ground truth database,” Pattern Recognition Letters, vol. 30, no. 2, p. 88–97, Jan. 2009. [Online]. Available: http://dx.doi.org/10.1016/j.patrec.2008.04.005
[134] M. Abukmeil, A. Genovese, V. Piuri, F. Rundo, and F. Scotti, “Towards explainable semantic segmentation for autonomous driving systems by multi-scale variational attention,” in 2021 IEEE International Conference on Autonomous Systems (ICAS), 2021, pp. 1–5.
[135] J. Geyer, Y. Kassahun, M. Mahmudi, X. Ricou, R. Durgesh, A. S. Chung, L. Hauswald, V. H. Pham, M. Mühlegg, S. Dorn et al., “A2d2: Audi autonomous driving dataset,” arXiv preprint arXiv:2004.06320, 2020.
[136] H. Mankodiya, D. Jadav, R. Gupta, S. Tanwar, W.-C. Hong, and R. Sharma, “Od-xai: Explainable ai-based semantic object detection for autonomous vehicles,” Applied Sciences, vol. 12, no. 11, p. 5310, 2022.
[137] T. Nowak, M. R. Nowicki, K. Ćwian, and P. Skrzypczyński, “How to improve object detection in a driver assistance system applying explainable deep learning,” in 2019 IEEE Intelligent Vehicles Symposium (IV), 2019, pp. 226–231.
[138] D. Schinagl, G. Krispel, H. Possegger, P. M. Roth, and H. Bischof, “Occam’s laser: Occlusion-based attribution maps for 3d object detectors on lidar data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 1141–1150.
[139] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[140] L. Gou, L. Zou, N. Li, M. Hofmann, A. K. Shekar, A. Wendt, and L. Ren, “Vatld: A visual analytics system to assess, understand and improve traffic light detection,” IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2, pp. 261–271, 2021.
[141] K. Behrendt, L. Novak, and R. Botros, “A deep learning approach to traffic lights: Detection, tracking, and classification,” in 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017, pp. 1370–1377.
[142] C. Schorr, P. Goodarzi, F. Chen, and T. Dahmen, “Neuroscope: An explainable ai toolbox for semantic segmentation and image classification of convolutional neural nets,” Applied Sciences, vol. 11, no. 5, p. 2199, 2021.
[143] J. Wang, Y. Li, Z. Zhou, C. Wang, Y. Hou, L. Zhang, X. Xue, M. Kamp, X. L. Zhang, and S. Chen, “When, where and how does it fail? a spatial-temporal visual analytics approach for interpretable object detection in autonomous driving,” IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 12, pp. 5033–5049, 2023.
[144] E. Haedecke, M. Mock, and M. Akila, “ScrutinAI: A Visual Analytics Approach for the Semantic Analysis of Deep Neural Network Predictions,” in EuroVis Workshop on Visual Analytics (EuroVA), J. Bernard and M. Angelini, Eds. The Eurographics Association, 2022.
[145] T. Jiang, Y. Liu, Q. Dong, and T. Xu, “Intention-Aware Interactive Transformer for Real-Time Vehicle Trajectory Prediction in Dense Traffic,” Transportation Research Record, vol. 2677, no. 3, pp. 946–960, Mar. 2023.
[146] P. Kochakarn, D. De Martini, D. Omeiza, and L. Kunze, “Explainable Action Prediction through Self-Supervision on Scene Graphs,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), May 2023, pp. 1479–1485.
[147] G. Singh, S. Akrigg, M. Di Maio, V. Fontana, R. J. Alitappeh, S. Saha, K. Jeddisaravi, F. Yousefi, J. Culley, T. Nicholson et al., “Road: The road event awareness dataset for autonomous driving,” IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 01, pp. 1–1, feb 5555.
[148] W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,” The International Journal of Robotics Research (IJRR), vol. 36, no. 1, pp. 3–15, 2017. [Online]. Available: http://dx.doi.org/10.1177/0278364916679498
[149] H. Liu, J. Zhao, and L. Zhang, “Interpretable and flexible target-conditioned neural planners for autonomous vehicles,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 10 076–10 082.
[150] K. Wang, J. Hou, and X. Zeng, “Lane-Change Intention Prediction of Surrounding Vehicles Using BiLSTM-CRF Models with Rule Embedding,” in Proceedings - 2022 Chinese Automation Congress, CAC 2022, vol. 2022-January, 2022, pp. 2764–2769.
[151] H. Hu, Q. Wang, M. Cheng, and Z. Gao, “Trajectory Prediction Neural Network and Model Interpretation Based on Temporal Pattern Attention,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 3, pp. 2746–2759, Mar. 2023.
[152] S.-Y. Yu, A. V. Malawade, D. Muthirayan, P. P. Khargonekar, and M. A. A. Faruque, “Scene-Graph Augmented Data-Driven Risk Assessment of Autonomous Vehicle Decisions,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 7941–7951, Jul. 2022.
[153] V. Ramanishka, Y.-T. Chen, T. Misu, and K. Saenko, “Toward driving scene understanding: A dataset for learning driver behavior and causal reasoning,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[154] K. Zhang and L. Li, “Explainable multimodal trajectory prediction using attention models,” Transportation Research Part C: Emerging Technologies, vol. 143, p. 103829, Oct. 2022.
[155] J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 563–578.
[156] M. A. Kühn, D. Omeiza, and L. Kunze, “Textual explanations for automated commentary driving,” arXiv preprint arXiv:2304.08178, 2023.
[157] J. Dong, S. Chen, M. Miralinaghi, T. Chen, and S. Labi, “Development and testing of an image transformer for explainable autonomous driving systems,” Journal of Intelligent and Connected Vehicles, vol. 5, no. 3, pp. 235–249, 2022.
[158] Z. Zhang, R. Tian, R. Sherony, J. Domeyer, and Z. Ding, “Attention-based interrelation modeling for explainable automated driving,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 2, pp. 1564–1573, 2023.
[159] Y. Feng, W. Hua, and Y. Sun, “NLE-DM: Natural-Language Explanations for Decision Making of Autonomous Driving Based on Semantic Scene Understanding,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 9, pp. 9780–9791, Sep. 2023.
[160] Y. Zhang, W. Wang, X. Zhou, Q. Wang, and X. Sun, “Tactical-level explanation is not enough: Effect of explaining av’s lane-changing decisions on drivers’ decision-making, trust, and emotional experience,” International Journal of Human–Computer Interaction, vol. 39, no. 7, pp. 1438–1454, 2023.
[161] T. Chen, R. Tian, Y. Chen, J. E. Domeyer, H. Toyoda, R. Sherony, T. Jing, and Z. Ding, “Psi: A pedestrian behavior dataset for socially intelligent autonomous car,” ArXiv, vol. abs/2112.02604, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:244909387
[162] M. Gadd, D. de Martini, L. Marchegiani, P. Newman, and L. Kunze, “Sense–Assess–eXplain (SAX): Building Trust in Autonomous Vehicles in Challenging Real-World Driving Scenarios,” in 2020 IEEE Intelligent Vehicles Symposium (IV), Oct. 2020, pp. 150–155.
[163] K. Mori, H. Fukui, T. Murase, T. Hirakawa, T. Yamashita, and H. Fujiyoshi, “Visual explanation by attention branch network for end-to-end learning-based self-driving,” in 2019 IEEE intelligent vehicles symposium (IV). IEEE, 2019, pp. 1577–1582.
[164] H. Wang, P. Cai, Y. Sun, L. Wang, and M. Liu, “Learning interpretable end-to-end vision-based motion planning for autonomous driving with optical flow distillation,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 13 731–13 737.
[165] J. Chen, S. E. Li, and M. Tomizuka, “Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 5068–5078, 2022.
[166] S. Yang, W. Wang, C. Liu, and W. Deng, “Scene understanding in deep learning-based end-to-end controllers for autonomous vehicles,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 1, pp. 53–63, 2019.
[167] L. Cultrera, L. Seidenari, F. Becattini, P. Pala, and A. Del Bimbo, “Explaining autonomous driving by learning end-to-end visual attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 340–341.
[168] B. Wymann, E. Espié, C. Guionneau, C. Dimitrakakis, R. Coulom, and A. Sumner, “Torcs, the open racing car simulator,” Software available at http://torcs. sourceforge. net, vol. 4, no. 6, p. 2, 2000.
[169] E. Aksoy and A. Yazici, “Attention model for extracting saliency map in driving videos,” in 2020 28th Signal Processing and Communications Applications Conference (SIU), 2020, pp. 1–4.
[170] Y. Xia, D. Zhang, J. Kim, K. Nakayama, K. Zipser, and D. Whitney, “Predicting driver attention in critical situations,” 2017. [Online]. Available: https://arxiv.org/abs/1711.06406
[171] A. Borji and L. Itti, “Cat2000: A large scale fixation dataset for boosting saliency research,” 2015. [Online]. Available: https://arxiv.org/abs/1505.03581
[172] K. Chitta, A. Prakash, and A. Geiger, “Neat: Neural attention fields for end-to-end autonomous driving,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 773–15 783.
[173] A. Sadat, S. Casas, M. Ren, X. Wu, P. Dhawan, and R. Urtasun, Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations. Springer International Publishing, 2020, p. 414–430.
[174] B. Wei, M. Ren, W. Zeng, M. Liang, B. Yang, and R. Urtasun, “Perceive, attend, and drive: Learning spatial attention for safe self-driving,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 4875–4881.
[175] Y. Tashiro and H. Awano, “Pay attention via quantization: Enhancing explainability of neural networks via quantized activation,” IEEE Access, vol. 11, pp. 34 431–34 439, 2023.
[176] Udacity, “Udacity self-driving car driving data,” Jun. 2020, [online] Available: https://github.com/udacity/self-driving-car.
[177] S. Teng, L. Chen, Y. Ai, Y. Zhou, Z. Xuanyuan, and X. Hu, “Hierarchical interpretable imitation learning for end-to-end autonomous driving,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 673–683, 2022.
[178] T. Kadir and M. Brady, “Saliency, scale and image description,” International Journal of Computer Vision, vol. 45, no. 2, pp. 83–105, 2001.
[179] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. A. Riedmiller, “Striving for simplicity: The all convolutional net,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015.
[180] S. Wiegreffe and Y. Pinter, “Attention is not not Explanation,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), K. Inui, J. Jiang, V. Ng, and X. Wan, Eds. Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 11–20.
[181] A. Corso and M. J. Kochenderfer, “Interpretable safety validation for autonomous vehicles,” in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020, pp. 1–6.
[182] J. DeCastro, K. Leung, N. Aréchiga, and M. Pavone, “Interpretable policies from formally-specified temporal properties,” in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020, pp. 1–7.
[183] M. Kang, W. Lee, K. Hwang, and Y. Yoon, “Vision transformer for detecting critical situations and extracting functional scenario for automated vehicle safety assessment,” Sustainability, vol. 14, no. 15, p. 9680, 2022.
[184] T. Menzel, G. Bagschik, L. Isensee, A. Schomburg, and M. Maurer, “From functional to logical scenarios: Detailing a keyword-based scenario description for execution in a simulation environment,” in 2019 IEEE Intelligent Vehicles Symposium (IV), 2019, pp. 2383–2390.
[185] Y. Li, H. Wang, L. M. Dang, T. N. Nguyen, D. Han, A. Lee, I. Jang, and H. Moon, “A deep learning-based hybrid framework for object detection and recognition in autonomous driving,” IEEE Access, vol. 8, pp. 194 228–194 239, 2020.
[186] V. Petsiuk, A. Das, and K. Saenko, “Rise: Randomized input sampling for explanation of black-box models,” arXiv preprint arXiv:1806.07421, 2018.
[187] H. Shao, L. Wang, R. Chen, H. Li, and Y. Liu, “Safety-enhanced autonomous driving using interpretable sensor fusion transformer,” in Conference on Robot Learning. PMLR, 2023, pp. 726–737.
[188] D. Brajovic, N. Renner, V. P. Goebels, P. Wagner, B. Fresz, M. Biller, M. Klaeb, J. Kutz, J. Neuhuettler, and M. F. Huber, “Model reporting for certifiable ai: A proposal from merging eu regulation into ai development,” arXiv preprint arXiv:2307.11525, 2023.
[189] M. Wäschle, F. Thaler, A. Berres, F. Pölzlbauer, and A. Albers, “A review on AI Safety in highly automated driving,” Frontiers in Artificial Intelligence, vol. 5, 2022.
[190] J. Wang and R. Jia, “Data Banzhaf: A Robust Data Valuation Framework for Machine Learning,” in Proceedings of Machine Learning Research, vol. 206, 2023, pp. 6388–6421.
[191] C. Wang, F. Guo, R. Yu, L. Wang, and Y. Zhang, “The application of driver models in the safety assessment of autonomous vehicles: Perspectives, insights, prospects,” IEEE Transactions on Intelligent Vehicles, 2023.
[192] G. Vilone and L. Longo, “Explainable artificial intelligence: a systematic review,” arXiv preprint arXiv:2006.00093, 2020.
[193] A.-M. Marcu, L. Chen, J. Hünermann, A. Karnsund, B. Hanotte, P. Chidananda, S. Nair, V. Badrinarayanan, A. Kendall, J. Shotton, E. Arani, and O. Sinavski, “Lingoqa: Video question answering for autonomous driving,” 2024. [Online]. Available: https://arxiv.org/abs/2312.14115
[194] N. Elhage, N. Nanda, C. Olsson, T. Henighan, N. Joseph, B. Mann, A. Askell, Y. Bai, A. Chen, T. Conerly et al., “A mathematical framework for transformer circuits,” Transformer Circuits Thread, vol. 1, 2021.