CaDRE: Controllable and Diverse Generation of Safety-Critical Driving Scenarios using Real-World Trajectories

Peide Huang

{}^{1}

Wenhao Ding

{}^{1}

Jonathan Francis

{}^{2}

Bingqing Chen

{}^{2}

Ding Zhao

{}^{1}

*This work was partially performed during PH’s internship at the Bosch Center for Artificial Intelligence.

{}^{1}

PH, WD, DZ are with Carnegie Mellon University, USA.

{}^{2}

JF, BC are with Bosch Center for Artificial Intelligence. Contact: peideh@andrew.cmu.edu

Abstract

Simulation is an indispensable tool in the development and testing of autonomous vehicles (AVs), offering an efficient and safe alternative to road testing by allowing the exploration of a wide range of scenarios. Despite its advantages, a significant challenge within simulation-based testing is the generation of safety-critical scenarios, which are essential to ensure that AVs can handle rare but potentially fatal situations. This paper addresses this challenge by introducing a novel generative framework, CaDRE, which is specifically designed for generating diverse and controllable safety-critical scenarios using real-world trajectories. Our approach optimizes for both the quality and diversity of scenarios by employing a unique formulation and algorithm that integrates real-world data, domain knowledge, and black-box optimization techniques. We validate the effectiveness of our framework through extensive testing in three representative types of traffic scenarios. The results demonstrate superior performance in generating diverse and high-quality scenarios with greater sample efficiency than existing reinforcement learning and sampling-based methods.

I INTRODUCTION

Simulation plays a pivotal role in the domain of autonomous driving, serving crucial functions in both training and evaluation [1, 2, 3, 4]. In contrast to the costly and time-consuming nature of on-road testing, simulation offers efficient feedback to developers, avoiding risky engagements in the physical world [5]. Furthermore, simulation enables the capability to incorporate various scenario sources, ranging from real-world logs and random perturbations to templates crafted by human experts. This versatility in scenario selection facilitates a comprehensive analysis of the performance of autonomous vehicles (AVs).

However, it is widely recognized that traffic scenarios in the real world exhibit a long-tail distribution, with normal scenarios constituting the majority and safety-critical scenarios occurring infrequently [6, 7]. Training AVs exclusively on these normal scenarios impedes the ability to generalize to critical situations, potentially leading to fatal accidents upon widespread deployment. During the development stage, evaluating AVs only in normal scenarios results in biased and incomplete assessments, as models may need to compromise slightly on performance in normal scenarios to improve robustness in safety-critical ones [8, 9, 10]. Consequently, there is an urgent need for the generation of safety-critical scenarios within simulations.

There are three principal challenges in generating safety-critical scenarios. The first challenge is realism, which requires the scenarios generated to be sufficiently realistic to occur in the real world. This realism is typically interpreted as the similarity between the distributions of real-world and generated scenarios [11, 12, 13]. To achieve this, algorithms often involve either modifying pre-existing normal scenarios [14, 15] or employing generative models to approximate the distribution of real-world scenarios and adjusting the model-derived samples accordingly [16, 17]. The second challenge is diversity, which demands that the generation algorithm cover a wide spectrum of scenarios rather than focusing on a single failure instance. Previous approaches utilizing adversarial attack [14, 18] or reinforcement learning [19] tend to identify the most severe cases but are lacking in producing a diverse set of scenarios. The final challenge is in ensuring that generated scenarios are in alignment with specific factors or guidelines that affect scene variation, a concept referred to as controllability. These guidelines are often expressed through constraints [20], temporal logic [16], or language [17, 21], all requiring meticulous model architecture design to facilitate the integration of these guidelines.

In this paper, we introduce a generative framework CaDRE, which employs the Quality-Diversity (QD) formulation for the generation of safety-critical scenarios. Through a novel design, CaDRE addresses the above three challenges by integrating information from real-world data, domain knowledge, and black-box optimization and explicitly optimizing for both high-quality and diverse scenarios. Specifically, to maintain the realism of generated scenarios, we optimize the perturbations added to trajectories from real-world scenarios within defined constraints. Subsequently, we apply the QD algorithm to simultaneously explore and optimize continuous perturbation spaces efficiently. Finally, we achieve controllability by retrieving from archived scenarios according to the specific measure values defined by the user.

Refer to caption — Figure 1: Overview of the CaDRE framework.

The main contributions can be summarized below:

•

We propose CaDRE, a novel QD formulation for the generation of diverse and controllable safety-critical scenarios in autonomous driving.
•

We propose an occupancy-aware restart mechanism as a general extension to the QD algorithm family, which improves the exploration efficiency of the algorithms.
•

We conduct experiments on three representative real-world traffic scenario types: unprotected cross-turn, high-speed lane-change, and U-turn. The experimental results demonstrate that CaDRE can generate diverse and high-quality scenarios, with better sample-efficiency compared to both RL- and sampling-based methods.

II RELATED WORK

Safety-critical Scenario Generation. One significant component of autonomous driving simulation is the traffic model, which governs the behavior of the background vehicles, crucial to simulating real-world scenarios. TrafficSim [22] improves the generation process by using graph neural networks to extract interactions between vehicles. TrafficGen [11] proposes to generate the initial condition and sequential behavior of vehicles separately. ScenarioNet [23] further extends this framework to build a large-scale simulation platform that supports multiple open-source datasets.

Unlike the aforementioned works, which aim to generate realistic scenarios, we focus on the long-tail distribution, consisting of the safety-critical scenarios, to provide efficient evaluations of the safety of AVs [24, 25, 26]. Most of the existing literature in this category focuses on adversarial generation. L2C [19], MMG [27], and CausalAF [28] generate initial conditions for open-loop scenario generation using reinforcement learning. The methods in [14, 29, 30] optimize the trajectories of actors with black-box optimization to attack the ego vehicle. KING [15] and AdvDO [31] further assume access to differential dynamics models to improve the efficiency of finding safety-critical scenarios. Since adversarial attacks sacrifice the diversity and controllability of generated scenarios, imitation learning [32], retrieval-augmented generation [33], causality [34], and evolutionary algorithms [35] have also been explored. To use language as conditions, LCTGen [17] predefined an intermediate representation to bridge the large language model (LLM) [36] and the low-level trajectory generator, and CTG++ [37] uses LLMs to generate signal temporal logic to guide the sampling process of diffusion models. In this paper, we depart from the common practice of leveraging adversarial generation methods and instead focus on improving the diversity and controllability of generated samples through our novel use and extensions of Quality-Diversity algorithms.

Quality-Diversity Algorithms in Robotics. QD is a branch of optimization that finds a collection of high-performing, yet qualitatively different solutions [38, 39]. Specifically, QD optimizes an objective for each point in a measure space. Solving a QD problem in a continuous measure space requires infinite memory [39], so, in practice, the measure space is discretized into a finite set, and an archive is maintained to keep track of the best-known solutions over the finite set.

Given QD’s ability to find a collection of high-performing solutions for different contexts, it is well-suited for many robotics applications. In the pioneering work of [40], a behavior-performance map is learned to enable the robot to quickly find a compensatory behavior and adapt after damage. QD has also been used on problems, such as human-robot interaction [41, 42], robot manipulation [43, 44], locomotion [45, 46], and morphology design [47, 48].

Popular QD algorithms, e.g., MAP-Elites [38] and CMA-ME [49], are predicated on evolutionary strategies to implement their underlying search policies. The goal of this search is to find a solution for a particular parameter configuration and to update the archive set, accordingly. These QD methods initiate this process from random regions in the search with no regard for the density of the local neighborhood of solutions; this can be inefficient, due to the possibility of restarting from already-known regions. To alleviate these issues, we introduce a novel occupancy-aware restart (OAR) mechanism, providing a form of guidance for improved coverage and efficiency during exploration. We assess the value of our novel OAR mechanism in terms of search efficiency and convergence through comparisons with the above QD algorithms and multi-particle exploration mechanisms used in reinforcement learning. To the best of our knowledge, we are the first to formulate QD for the challenging problem of safety-critical scenario generation in autonomous driving; our approach enables us to generate a map of diverse and high-quality scenes whose parameters vary smoothly along the dimensions defined by expressive measure spaces.

III METHODOLOGY

Our method, Controllable and Diverse Generation of Safety-Critical Driving Scenarios using REal-world trajectories (CaDRE), integrates real-world data, domain knowledge, and black-box optimization techniques. As illustrated in Fig. 1, for each iteration, CaDRE maintains a grid archive of generated scenarios. First, it uses the QD algorithm to update the distribution from which the perturbations to the real-world trajectories are sampled. Then CaDRE simulates the perturbations to obtain diverse behavior measures and updates the archive according to the simulation results. Finally, we obtain an archive that contains thousands of critical scenarios, each with different behaviors according to the measure functions we defined using domain knowledge.

Let $\bm{x}_{t}^{i}\in\mathbb{R}^{2},\psi_{t}^{i}\in[-\pi,\pi]$ and $v_{t}^{i}\in\mathbb{R}$ be the ground-plane coordinate, orientation, and speed of the world frame of the $i$ -th vehicle agent at time $t$ . The ego vehicle, with index $i=0$ , is the vehicle for which we want to generate critical scenarios. We denote the state of the vehicle as $\bm{s}_{t}=\left\{\bm{x}_{t}^{i},\psi_{t}^{i},v_{t}^{i}\right\}_{i=0}^{N}$ , where $N$ is the number of background agents. We define a specific traffic scenario as a sequence of these states $\mathcal{S}=\left\{\bm{s}_{t}\right\}_{t=0}^{T}$ , where $T$ is a fixed time horizon. We initialize a specific scenario from a real-world dataset that contains only naturalistic driving scenarios.

Safety-Critical Perturbation. We perturb the trajectory of one background vehicle indexed by $i\in[1,\ldots,N]$ to generate safety-critical scenarios. We first recover the action sequence $\left\{\bm{a}_{t}^{i}\right\}^{T-1}_{t=0}$ from $\left\{\bm{s}_{t}^{i}\right\}^{T}_{t=0}$ , assuming a kinematic bicycle model:

\frac{d}{dt}\begin{bmatrix}x\\ y\\ \psi\\ v\end{bmatrix}=\begin{bmatrix}v\cos(\psi)\\ v\sin(\psi)\\ v\tan(\psi)/L\\ a\end{bmatrix},

(1)

where $L$ is the wheelbase. Each action consists of acceleration and steering input, i.e., $\bm{a}_{t}^{i}:=[a_{t}^{i},\delta_{t}^{i}]$ . A new trajectory can be generated by 1) applying a sequence of bounded perturbations $\left\{\Delta\bm{a}_{t}^{i}\right\}^{T-1}_{t=0}$ to the recovered action sequence, and 2) unrolling the kinematics model from $\bm{s}_{0}^{i}$ using Eqn. 1. We then parameterize each safety-critical scenario with $\bm{\theta}=\left\{\Delta\bm{a}_{0}^{i},\ldots,\Delta\bm{a}_{T-1}^{i}\right\}% \in\mathbb{R}^{T\times 2}$ .

Black-Box Ego Policy. We follow [15, 31] and assume that the ego vehicle is reactive to nearby vehicles and tries to follow the original trajectory. The problem is formulated as a black-box optimization; CaDRE does not require access to the reactive policy and would work with any other ego policies.

III-A Quality-Diversity Formulation for Scenario Generation

Inspired by previous work [41, 39], we formulate the problem of generating a diverse set of safety-critical driving scenarios as a QD problem. First, we define an objective function $f:\mathbb{R}^{T\times 2}\rightarrow\mathbb{R}$ to quantify the safety-critical level. We further define K measure functions $m_{k}:\mathbb{R}^{T\times 2}\rightarrow\mathbb{R}$ , jointly represented as $\bm{m}:\mathbb{R}^{T\times 2}\rightarrow\mathbb{R}^{K}$ , which are a set of user-defined functions to quantify aspects of the scenario that we aim to diversify. We denote $\mathcal{M}=\bm{m}(\mathbb{R}^{T\times 2})\subseteq\mathbb{R}^{K}$ as the measure space formed by the range of $\bm{m}$ . Because $f$ evaluates the quality of a scenario $\bm{\theta}$ , the goal of the QD problem is to find, for each $s\in\mathcal{S}$ , a scenario $\bm{\theta}$ , such that $\bm{m}(\bm{\theta})=s$ and that $f(\bm{\theta})$ is maximized (Eqn. 2):

	$\displaystyle\max\quad$	$\displaystyle f(\bm{\theta})$		(2)
	s.t.	$\displaystyle\bm{m}(\bm{\theta})=s,\;\forall s\in\mathcal{S}.$		(2)

In practice, we discretize $\mathcal{M}$ into a finite number of $M$ cells and solve the simplified version of the problem:

\max_{\bm{\theta}_{1},\ldots,\bm{\theta}_{M}}\sum_{n=1}^{M}f(\bm{\theta}_{n}).

(3)

With a slight abuse of notation, we will use $f$ to denote the objective value and $\bm{m}$ to denote the values of the measure function. we also denote the archive as $M$ , and we can retrieve the scenarios from the archive by $M[\bm{m}]$ . With properly defined objective and measure functions, we can optimize a diverse population of safety-critical scenarios and retrieve individual scenarios in a controllable manner by asking for specific measure values $\bm{m}$ . We build a lightweight traffic scenario simulator $Sim(\mathcal{S},\bm{\theta},i)$ , which outputs the objective value $f$ and the measure values $\bm{m}$ , given the original scenario $\mathcal{S}$ , perturbation $\bm{\theta}$ , and the index of the perturbed vehicle $i$ .

Input: Real-world scenario

\mathcal{S}

, index of the perturbed background vehicle

i

, traffic simulator

Sim

, batch size

B

, an empty grid archive

M

Output: An grid archive

M

containing diverse safety-critical scenarios

\mathcal{S}_{c}

Initialize emitter

e

Recover

\left\{\bm{a}_{t}^{i}\right\}^{T-1}_{t=0}

from

\left\{\bm{s}_{t}^{i}\right\}^{T}_{t=0}

\mathcal{S}

for iter = 1, $\ldots$ , total_iter do

\{\bm{\theta}_{b}\}^{B}_{b=1}\sim\mathcal{N}(e.\mu,e.C)

for $b=1,\ldots,B$ do

{

f_{b},\bm{m}_{b}\}\leftarrow Sim(\mathcal{S},\bm{\theta}_{b},i)

Unpack

parents

, sampling mean

\mu

, covariance matrix

C

, and parameter set

P

from e.

for $b=1,\ldots,B$ do

if $M[\bm{m}_{b}]$ is empty then

\Delta_{b}\leftarrow f_{b}

Flag that

\bm{\theta}

discovered a new cell

Add

\bm{\theta}_{b}

parents

else if $f_{b}>M[\bm{m}_{b}].f$ then

\Delta_{b}\leftarrow f_{b}-M[\bm{m}_{b}].f

Add

\bm{\theta}_{b}

parents

if parents $\neq\varnothing$ then

Sort

parents

by (newCell,

\Delta_{b}

)

Update

\mu,C,P

according to

parents

parents\leftarrow\varnothing

else

Occupancy-aware restart from an elite in

M

Algorithm 1 CaDRE: Controllable and Diverse Generation of Safety-Critical Driving Scenarios

III-B Design of Objective and Measure Functions

Objective Function. The objective function $f$ quantifies the safety-critical level, motivated by prior work on safety-critical scenario generation:

f(\bm{\theta}):=\begin{cases}1,\quad\text{if vehicle $i$ collides with the ego% vehicle}\\ 0,\quad\text{if vehicle $i$ collides with background vehicles}\\ \exp(-\min_{t}d(\bm{x}_{t}^{0},\bm{x}_{t}^{i})),\quad\text{otherwise},\\ \end{cases}

(4)

where $d(\cdot,\cdot)$ is the $l_{2}$ distance.

Measure Functions. The measure functions are essential to capture different aspects of critical scenarios. We propose three measure functions to define the diverse behavior of perturbed vehicles. These measure functions collectively enable the definition and evaluation of a wide range of safety-critical scenarios, focusing on essential factors such as perturbation efforts ( $m_{1}$ ), urgency of response ( $m_{2}$ ), and collision behavior ( $m_{3}$ ). Here, $m_{1}$ measures the mean magnitude of the steering perturbation. It reflects how much the generated trajectory would deviate from the original trajectory:

m_{1}=\frac{1}{t_{\text{impact}}}\sum_{t=0}^{t_{\text{impact}}-1}\left|\delta_% {t}^{i}\right|.

(5)

Next, $m_{2}$ measures the normalized impact time; it helps categorize scenarios based on the urgency of the response required, aiding in the development of time-critical decision-making algorithms for AVs:

m_{2}=t_{\text{impact}}/T.

(6)

Finally, $m_{3}$ measures the impact angle relative to the body frame of the vehicle ego. It allows for the evaluation of how well autonomous driving systems can recognize and react to threats from various directions, enhancing their ability to prevent accidents through appropriate maneuvering or braking:

m_{3}=atan2(R_{\psi_{t}^{0}}(\bm{x}_{t}^{i}-\bm{x}_{t}^{0})^{T}),

(7)

where $R_{\psi_{t}^{0}}$ is the rotation matrix, $t=t_{\text{impact}}$ , and $atan2$ is the 2-argument arctangent function. The $x$ -axis of the body frame is pointing to the front of the vehicle, and the $y$ -axis is pointing to the left. If there is no collision, we assume $t_{\text{impact}}=\arg\min_{t}d(\bm{x}_{t}^{0},\bm{x}_{t}^{i})$ . Note that the time horizon for different vehicles can be different as some vehicles may appear or leave the scene at different times in real-world scenarios.

We adopt a variant of a QD algorithm, namely Covariance Matrix Adaptation MAP-Elites (CMA-ME) [49], to find both a higher quality and a wider diversity of safety-critical scenarios. The algorithm is adapted from CMA-ME (Algorithm 1). The key difference between QD algorithms such as CMA-ME and evolutionary strategies such as CMA-ES is that CMA-ME employs the archiving mechanism to maintain diversity [49]. Another difference is that CMA-ME adjusts the parent ranking rules that update the sampling distribution to maximize the likelihood of archive improvement; It ranks solutions filling empty cells higher than those replacing existing ones.

III-C Occupancy-Aware Restart

Existing QD algorithms [49] restart from a random elite in the archive when there is no improvement in the archive. However, it is not efficient since searching from the elites whose neighboring cells are empty is more beneficial to the exploration than from densely occupied regions in general. To improve exploration efficiency, we propose Occupancy-Aware Restart (OAR), a restart mechanism that considers the occupancy rate of neighboring cells.

As illustrated in Fig. 2, OAR assigns a higher probability to elites with more empty neighboring cells. More specifically, given the neighbor empty rate of $L$ elites $r_{1},\ldots,r_{L}$ and the temperature $T$ , the softmax probability of restarting from elite $i$ is computed by:

p_{i}=\frac{e^{r_{i}/T}}{\sum_{j=1}^{L}e^{r_{j}/T}}.

(8)

As $T\rightarrow+\infty$ , OAR degenerates to the uniform sampling. With a lower $T$ , OAR assigns a higher probability to those elites who have more empty neighbors. For efficient implementation, we use a 3D convolution kernel to compute the number of empty cells around each elite.

TABLE I: The final performance of coverage, mean objective, QD score. We report the mean and variance over 5 perturbed vehicles for each scene. The QD score is shown in multiples of

1e3

Unprotected cross-turn High-speed lane-change U-turn Method Coverage ( $\uparrow$ ) Mean Obj ( $\uparrow$ ) QD Score ( $\uparrow$ ) Coverage ( $\uparrow$ ) Mean Obj ( $\uparrow$ ) QD Score ( $\uparrow$ ) Coverage ( $\uparrow$ ) Mean Obj ( $\uparrow$ ) QD Score ( $\uparrow$ ) Random 0.140 $\pm$ 0.021 0.499 $\pm$ 0.123 0.285 $\pm$ 0.098 0.310 $\pm$ 0.158 0.310 $\pm$ 0.158 0.209 $\pm$ 0.131 0.188 $\pm$ 0.066 0.381 $\pm$ 0.134 0.320 $\pm$ 0.151 CMA-ES 0.182 $\pm$ 0.033 0.672 $\pm$ 0.076 0.489 $\pm$ 0.090 0.163 $\pm$ 0.018 0.540 $\pm$ 0.143 0.347 $\pm$ 0.086 0.228 $\pm$ 0.118 0.447 $\pm$ 0.228 0.502 $\pm$ 0.253 REINFORCE 0.210 $\pm$ 0.031 0.649 $\pm$ 0.115 0.551 $\pm$ 0.155 0.488 $\pm$ 0.144 0.488 $\pm$ 0.144 0.472 $\pm$ 0.161 0.286 $\pm$ 0.010 0.641 $\pm$ 0.117 0.731 $\pm$ 0.117 SVPG 0.219 $\pm$ 0.032 0.607 $\pm$ 0.130 0.553 $\pm$ 0.155 0.438 $\pm$ 0.096 0.438 $\pm$ 0.096 0.437 $\pm$ 0.166 0.290 $\pm$ 0.020 0.577 $\pm$ 0.120 0.665 $\pm$ 0.110 CaDRE (ours) 0.565 $\pm$ 0.054 0.829 $\pm$ 0.062 1.884 $\pm$ 0.309 0.541 $\pm$ 0.079 0.627 $\pm$ 0.142 1.375 $\pm$ 0.436 0.542 $\pm$ 0.094 0.793 $\pm$ 0.073 1.699 $\pm$ 0.219

IV EXPERIMENTS

IV-A Experimental Setup

Real-world Trajectories. We pick three representative scenarios from nuPlan v1.1 [50]: unprotected cross-turn, high-speed lane-change, and U-turn. All scenarios have a time horizon of $150$ frames @ $10Hz$ and are down-sampled to 5Hz in our experiments.

Reactive Ego Policy. We implement a rule-based ego policy: The ego vehicle will follow the reference trajectory. However, if there is a vehicle within $5m$ and $[-\pi/4,\pi/4]$ of the body frame of the ego, the ego vehicle will brake at $-7m/s^{2}$ and maximum steering angle $\pm\pi/8$ depending on the relative position of the vehicle w.r.t. the body frame of the ego. Recall that the problem is formulated as a black-box optimization, and CaDRE is agnostic to the ego agent’s policy.

Selection of Perturbed Vehicles and Perturbation Range. To ensure effective perturbation, we use a simple heuristic to select which vehicles to perturb: the top five background vehicles that have the smallest average distance to the ego vehicle. Acceleration perturbation is between $\pm 2$ , and steering perturbation range is between $\pm\pi/8$ .

Evaluation Metrics. We focus on three criteria that measure the quality and diversity of the archive, which are standard metrics in the QD literature [49, 39, 51, 52].

•

Coverage $\in[0,1]$ : Proportion of cells in the archive that have an elite.
•

Mean objective $\in[0,1]$ : Mean objective value of elites in the archive.
•

QD score $\in[0,4000]$ : Sum of the objective values of all elites in the archive. The theoretical maximum value of $4000$ is due to our objective $f\in[0,1]$ , and we discretize the measure space into $10\times 20\times 20$ grid.

Baselines. We study a mixture of sampling- and RL-based methods that have been employed by the existing literature.

•

Random Search (Random): uniformly-random sample from the solution space.
•

CMA-ES [53]: CMA-ES iteratively updates a population of solutions based on their fitness, using statistical information from previous generations to adaptively adjust the search distribution towards optimal regions of the solution space, on which CMA-ME is based but without QD.
•

Multi-particle REINFORCE [54]: policy gradient method employed by previous work [19, 27]. We set the number of particles to be the same as the batch size ( $36$ ) of CMA-ME employed by our algorithm.
•

Stein Variational Policy Gradient (SVPG) [55]: SVPG is an improved version of multi-particle REINFORCE. SVPG introduces a maximum entropy policy optimization framework that explicitly encourages diverse solutions and better exploration. Similar to multi-particle REINFORCE, we set the number of particles to be the same as the batch size of CMA-ME.

We use the QD algorithm library pyribs [39] to implement our framework. We aim to answer the following questions in our experimental study:

•

How does CaDRE compare with baseline methods in terms of the evaluation metrics and sample efficiency?
•

Is OAR effective in improving exploration?
•

Can we retrieve diverse scenarios generated by CaDRE in a controllable manner?

IV-B Sample-efficiency Compared with Baseline Methods

The coverage and QD score v.s. samples are shown in Figure 3. CaDRE outperforms all baselines with significant margins in three different scenarios, which demonstrates that CaDRE discovers not only high-quality but also diverse scenarios much faster than Random Search, SVPG, and REINFORCE, with the same number of samples. CaDRE utilizes the Covariance Matrix Adaptation (CMA) strategy, which adapts the search distribution over generations to increase the likelihood of sampling promising areas of the solution space. This adaptation is based on information from previous generations, allowing CaDRE to focus its sampling on regions with higher potential for high-quality solutions. Unlike Random Search, which samples uniformly across the solution space without learning from previous samples, CaDRE dynamically narrows its search to more promising regions. CMA-ES, despite being on which CMA-ME is based, shows a completely different purpose, which is to the likelihood of increasing objective and, therefore, quickly converges to a single optimum. SVPG, and REINFORCE, while more directed than Random Search, may still struggle with efficiently exploring complex problem spaces due to their focus on gradient-based optimization.

Table I shows the final performance of coverage, mean objective, and QD score. With the same number of samples in unprotected cross-turn, CaDRE achieves $158.0\%$ more coverage, $36.6\%$ higher mean objective, leading to a $240.7\%$ improvement in QD score than the best-performing baseline SVPG. It again highlights the superior exploration and exploitation capability of CaDRE compared to the baselines. Table II shows the ablation of OAR in the high-speed lane-change scenario. OAR improves the QD score of individual vehicles by a maximum of $33.0\%$ , which demonstrates the effectiveness of OAR.

TABLE II: QD score of occupancy-aware restart with different temperatures. The QD score is shown in multiples of

1e3

. We include the percentage improvement w.r.t

1/T=0

in parentheses.

Index $1/T=0$ $1/T=5$ $1/T=10$ 1 1.808 1.855 (2.6%) 2.026 (12.0%) 2 0.519 0.639 (22.9%) 0.691 (33.0%) 3 1.277 1.133 (-11.3%) 1.229 (-3.7%) 4 1.393 1.515 (8.8%) 1.367 (-1.9%) 5 1.444 1.299 (-10.0%) 1.563 (8.2%)

IV-C Analysis of the Generated Safety-Critical Scenarios

Visualization of Archives. We visualize the final archives in Fig. 6. It is observed that the proposed CaDRE leads to a much higher occupancy as well as mean objective in the final archives than the baseline SVPG. The main reason is that CaDRE explicitly encourages sustained exploration throughout optimization. CaDRE employs CMA-ME, which is particularly adept at exploring complex landscapes and finding a large amount of high-quality scenarios in different measures. Although the repulsive force in SVPG indeed introduces diversity among the particles to avoid premature convergence to local optima, the primary focus remains on optimizing a solution rather than explicitly seeking out diverse solutions across a range of measures.

Note that some cells are still unoccupied even for CaDRE. We hypothesize that it is due to the infeasibility of finding scenarios, which is induced by specific combinations of measure values, the vehicle states in the original scenarios, and the kinematics constraints. For example, it is extremely difficult to find a solution with a short impact time and a small impact angle (hitting from the front) in the unprotected cross-turn scenario, since there is no background vehicle starting near the front of the ego vehicle.

Distribution of Measure Values. Figure 4 visualizes the distribution of the measure function values. Compared to SVPG, CaDRE generates a denser and wider range of measure function values. However, both methods struggle to find safety-critical scenarios with very little steering perturbation, which is reasonable as the original scenarios only contain safe and regular traffic, and the perturbation is bounded.

Visualization of Generated Scenarios. In Fig. 5, we visualize five generated scenarios for the unprotected cross-turn, high-speed lane-change, and U-turn, respectively. The visualization shows that we are able to retrieve diverse critical scenarios in a controllable manner. For example, in the unprotected cross-turn, we can control the perturbed vehicle hitting the right side of the ego vehicle by steering a little bit from the original trajectory or hitting the left side by overtaking from the left, simply by asking for different combinations of measure function values $[m_{1},m_{2},m_{3}]$ in the archive.

IV-D Limitations and Future Directions

Although CaDRE has demonstrated the ability to generate diverse and controllable scenarios with superior sample efficiency, it is not without its limitations. First, CaDRE does not consider the lane information and road conditions such as barriers. It could generate scenarios that are kinematically feasible but unlikely in real life, such as going through the median of the highway. Second, CaDRE only perturbs one of the background vehicles. However, in the real world, there exist some critical scenarios induced by more than one vehicle or not directly caused by a collision with the perturbed vehicle. For example, a background vehicle makes a lane change to avoid hitting another vehicle breaking in front, thus hitting the ego vehicle from the side. It is a promising direction to extend CaDRE to consider road information and perturb more than one vehicle or one vehicle that indirectly causes the collision.

V CONCLUSIONS

In this work, we develop a framework CaDRE for generating safety-critical scenarios. As a variant of the QD algorithm, CaDRE enhances the diversity and controllability of the scenario generation process, thus providing an effective instrument for the simulation-based assessment of autonomous vehicles. We conduct extensive experiments on three representative scenarios: unprotected cross-turn, high-speed lane-change, and U-turn. The experimental results show that CaDRE can generate and retrieve diverse and high-quality scenarios with better sample efficiency compared with both RL- and sampling-based methods.

ACKNOWLEDGMENT

This work was partially performed during PH’s internship at the Bosch Center for Artificial Intelligence; we thank Bosch for the use of computing resources. The authors additionally thank Uksang Yoo and Benjamin Stoler for valuable conversations. The authors gratefully acknowledge the support from the National Science Foundation under grants CNS-2047454.

References

[1] C. Gulino, J. Fu, W. Luo, G. Tucker, E. Bronstein, Y. Lu, J. Harb, X. Pan, Y. Wang, X. Chen et al., “Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,” Advances in Neural Information Processing Systems, vol. 36, 2024.
[2] J. Herman, J. Francis, S. Ganju, B. Chen, A. Koul, A. Gupta, A. Skabelkin, I. Zhukov, M. Kumskoy, and E. Nyberg, “Learn-to-race: A multimodal control environment for autonomous racing,” in proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9793–9802.
[3] S. H. Park, G. Lee, J. Seo, M. Bhat, M. Kang, J. Francis, A. Jadhav, P. P. Liang, and L.-P. Morency, “Diverse and admissible trajectory forecasting through multimodal context understanding,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer, 2020, pp. 282–298.
[4] M. Xu, Z. Liu, P. Huang, W. Ding, Z. Cen, B. Li, and D. Zhao, “Trustworthy reinforcement learning against intrinsic vulnerabilities: Robustness, safety, and generalizability,” arXiv preprint arXiv:2209.08025, 2022.
[5] P. Huang, X. Zhang, Z. Cao, S. Liu, M. Xu, W. Ding, J. Francis, B. Chen, and D. Zhao, “What went wrong? closing the sim-to-real gap via differentiable causal discovery,” in Conference on Robot Learning. PMLR, 2023, pp. 734–760.
[6] W. Ding, C. Xu, M. Arief, H. Lin, B. Li, and D. Zhao, “A survey on safety-critical driving scenario generation—a methodological perspective,” IEEE Transactions on Intelligent Transportation Systems, 2023.
[7] B. Stoler, I. Navarro, M. Jana, S. Hwang, J. Francis, and J. Oh, “Safeshift: Safety-informed distribution shifts for robust trajectory prediction in autonomous driving,” arXiv preprint arXiv:2309.08889, 2023.
[8] E. Bronstein, S. Srinivasan, S. Paul, A. Sinha, M. O’Kelly, P. Nikdel, and S. Whiteson, “Embedding synthetic off-policy experience for autonomous driving via zero-shot curricula,” in Conference on Robot Learning. PMLR, 2023, pp. 188–198.
[9] P. Huang, M. Xu, F. Fang, and D. Zhao, “Robust reinforcement learning as a stackelberg game via adaptively-regularized adversarial training,” arXiv preprint arXiv:2202.09514, 2022.
[10] M. Xu, P. Huang, Y. Niu, V. Kumar, J. Qiu, C. Fang, K.-H. Lee, X. Qi, H. Lam, B. Li et al., “Group distributionally robust reinforcement learning with hierarchical latent variables,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2023, pp. 2677–2703.
[11] L. Feng, Q. Li, Z. Peng, S. Tan, and B. Zhou, “Trafficgen: Learning to generate diverse and realistic traffic scenarios,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 3567–3575.
[12] S. Tan, K. Wong, S. Wang, S. Manivasagam, M. Ren, and R. Urtasun, “Scenegen: Learning to generate realistic traffic scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 892–901.
[13] P. Huang, M. Xu, J. Zhu, L. Shi, F. Fang, and D. Zhao, “Curriculum reinforcement learning using optimal transport via gradual domain adaptation,” Advances in Neural Information Processing Systems, vol. 35, pp. 10 656–10 670, 2022.
[14] J. Wang, A. Pun, J. Tu, S. Manivasagam, A. Sadat, S. Casas, M. Ren, and R. Urtasun, “Advsim: Generating safety-critical scenarios for self-driving vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9909–9918.
[15] N. Hanselmann, K. Renz, K. Chitta, A. Bhattacharyya, and A. Geiger, “King: Generating safety-critical driving scenarios for robust imitation via kinematics gradients,” in European Conference on Computer Vision. Springer, 2022, pp. 335–352.
[16] Z. Zhong, D. Rempe, D. Xu, Y. Chen, S. Veer, T. Che, B. Ray, and M. Pavone, “Guided conditional diffusion for controllable traffic simulation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 3560–3566.
[17] S. Tan, B. Ivanovic, X. Weng, M. Pavone, and P. Kraehenbuehl, “Language conditioned traffic generation,” arXiv preprint arXiv:2307.07947, 2023.
[18] M. Xu, P. Huang, F. Li, J. Zhu, X. Qi, K. Oguchi, Z. Huang, H. Lam, and D. Zhao, “Scalable safety-critical policy evaluation with accelerated rare event sampling,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 12 919–12 926.
[19] W. Ding, B. Chen, M. Xu, and D. Zhao, “Learning to collide: An adaptive safety-critical scenarios generating method,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 2243–2250.
[20] W. Ding, H. Lin, B. Li, K. J. Eun, and D. Zhao, “Semantically adversarial driving scenario generation with explicit knowledge integration,” arXiv preprint arXiv:2106.04066, vol. 1, 2021.
[21] M. Xu, P. Huang, W. Yu, S. Liu, X. Zhang, Y. Niu, T. Zhang, F. Xia, J. Tan, and D. Zhao, “Creative robot tool use with large language models,” arXiv preprint arXiv:2310.13065, 2023.
[22] S. Suo, S. Regalado, S. Casas, and R. Urtasun, “Trafficsim: Learning to simulate realistic multi-agent behaviors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 400–10 409.
[23] Q. Li, Z. Peng, L. Feng, C. Duan, W. Mo, B. Zhou et al., “Scenarionet: Open-source platform for large-scale traffic scenario simulation and modeling,” arXiv preprint arXiv:2306.12241, 2023.
[24] W. Ding, M. Xu, and D. Zhao, “Cmts: Conditional multiple trajectory synthesizer for generating safety-critical driving scenarios,” in International Conference on Robotics and Automation (ICRA). IEEE, 2020.
[25] W. Ding, W. Wang, and D. Zhao, “Multi-vehicle trajectories generation for vehicle-to-vehicle encounters,” in 2019 IEEE International Conference on Robotics and Automation (ICRA), 2019.
[26] C. Xu, W. Ding, W. Lyu, Z. Liu, S. Wang, Y. He, H. Hu, D. Zhao, and B. Li, “Safebench: A benchmarking platform for safety evaluation of autonomous vehicles,” Advances in Neural Information Processing Systems, vol. 35, pp. 25 667–25 682, 2022.
[27] W. Ding, B. Chen, B. Li, K. J. Eun, and D. Zhao, “Multimodal safety-critical scenarios generation for decision-making algorithms evaluation,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1551–1558, 2021.
[28] W. Ding, H. Lin, B. Li, and D. Zhao, “Causalaf: Causal autoregressive flow for goal-directed safety-critical scenes generation,” arXiv preprint arXiv:2110.13939, 2021.
[29] M. Klischat and M. Althoff, “Generating critical test scenarios for automated vehicles with evolutionary algorithms,” in 2019 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2019, pp. 2352–2358.
[30] M. Arief, Z. Huang, G. K. S. Kumar, Y. Bai, S. He, W. Ding, H. Lam, and D. Zhao, “Deep probabilistic accelerated evaluation: A robust certifiable rare-event simulation methodology for black-box safety-critical systems,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 595–603.
[31] Y. Cao, C. Xiao, A. Anandkumar, D. Xu, and M. Pavone, “Advdo: Realistic adversarial attacks for trajectory prediction,” in European Conference on Computer Vision. Springer, 2022, pp. 36–52.
[32] C. Zhang, J. Tu, L. Zhang, K. Wong, S. Suo, and R. Urtasun, “Learning realistic traffic agents in closed-loop,” in 7th Annual Conference on Robot Learning, 2023.
[33] W. Ding, Y. Cao, D. Zhao, C. Xiao, and M. Pavone, “Realgen: Retrieval augmented generation for controllable traffic scenarios,” arXiv preprint arXiv:2312.13303, 2023.
[34] W. Ding, H. Lin, B. Li, and D. Zhao, “Generalizing goal-conditioned reinforcement learning with variational causal reasoning,” Advances in Neural Information Processing Systems, vol. 35, pp. 26 532–26 548, 2022.
[35] A. Li, S. Chen, L. Sun, N. Zheng, M. Tomizuka, and W. Zhan, “Scegene: Bio-inspired traffic scenario generation for autonomous driving testing,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14 859–14 874, 2021.
[36] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
[37] Z. Zhong, D. Rempe, Y. Chen, B. Ivanovic, Y. Cao, D. Xu, M. Pavone, and B. Ray, “Language-guided traffic simulation via scene-level diffusion,” arXiv preprint arXiv:2306.06344, 2023.
[38] J.-B. Mouret and J. Clune, “Illuminating search spaces by mapping elites,” arXiv preprint arXiv:1504.04909, 2015.
[39] B. Tjanaka, M. C. Fontaine, D. H. Lee, Y. Zhang, N. R. Balam, N. Dennler, S. S. Garlanka, N. D. Klapsis, and S. Nikolaidis, “Pyribs: A bare-bones python library for quality diversity optimization,” in Proceedings of the Genetic and Evolutionary Computation Conference, ser. GECCO ’23. New York, NY, USA: Association for Computing Machinery, 2023, p. 220–229. [Online]. Available: https://doi.org/10.1145/3583131.3590374
[40] A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret, “Robots that can adapt like animals,” Nature, vol. 521, no. 7553, pp. 503–507, 2015.
[41] V. Bhatt, H. Nemlekar, M. Fontaine, B. Tjanaka, H. Zhang, Y.-C. Hsu, and S. Nikolaidis, “Surrogate assisted generation of human-robot interaction scenarios,” arXiv preprint arXiv:2304.13787, 2023.
[42] Y.-S. Tung, M. B. Luebbers, A. Roncone, and B. Hayes, “Workspace optimization techniques to improve prediction of human motion during human-robot collaboration,” arXiv preprint arXiv:2401.12965, 2024.
[43] A. Morel, Y. Kunimoto, A. Coninx, and S. Doncieux, “Automatic acquisition of a repertoire of diverse grasping trajectories through behavior shaping and novelty search,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 755–761.
[44] J. Huber, F. Hélénon, M. Kappel, E. Chelly, M. Khoramshahi, F. B. Amar, and S. Doncieux, “Speeding up 6-dof grasp sampling with quality-diversity,” arXiv preprint arXiv:2403.06173, 2024.
[45] S. Surana, B. Lim, and A. Cully, “Efficient learning of locomotion skills through the discovery of diverse environmental trajectory generator priors,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 12 134–12 141.
[46] J. Nordmoen, F. Veenstra, K. O. Ellefsen, and K. Glette, “Map-elites enables powerful stepping stones and diversity for modular robotics,” Frontiers in Robotics and AI, vol. 8, p. 639173, 2021.
[47] E. Zardini, D. Zappetti, D. Zambrano, G. Iacca, and D. Floreano, “Seeking quality diversity in evolutionary co-design of morphology and control of soft tensegrity modular robots,” in Genetic and Evolutionary Computation Conference, 2021.
[48] P. Liu, Z. Guo, H. Yu, H. Linghu, Y. Li, Y. Hou, H. Ge, and Q. Zhang, “A preliminary study of multi-task MAP-elites with knowledge transfer for robotic arm design,” in 2022 IEEE Congress on Evolutionary Computation (CEC). IEEE, jul 2022. [Online]. Available: https://doi.org/10.1109%2Fcec55065.2022.9870374
[49] M. C. Fontaine, J. Togelius, S. Nikolaidis, and A. K. Hoover, “Covariance matrix adaptation for the rapid illumination of behavior space,” in Proceedings of the 2020 genetic and evolutionary computation conference, 2020, pp. 94–102.
[50] H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,” arXiv preprint arXiv:2106.11810, 2021.
[51] M. Fontaine and S. Nikolaidis, “Differentiable quality diversity,” Advances in Neural Information Processing Systems, vol. 34, pp. 10 040–10 052, 2021.
[52] ——, “A quality diversity approach to automatically generating human-robot interaction scenarios in shared autonomy,” arXiv preprint arXiv:2012.04283, 2020.
[53] N. Hansen, “The cma evolution strategy: A tutorial,” arXiv preprint arXiv:1604.00772, 2016.
[54] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine learning, vol. 8, pp. 229–256, 1992.
[55] Y. Liu, P. Ramachandran, Q. Liu, and J. Peng, “Stein variational policy gradient,” arXiv preprint arXiv:1704.02399, 2017.