Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
License: CC BY-SA 4.0
arXiv:2403.13208v1 [cs.RO] 19 Mar 2024

CaDRE: Controllable and Diverse Generation of Safety-Critical Driving Scenarios using Real-World Trajectories

Peide Huang11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT  Wenhao Ding11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT  Jonathan Francis22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT  Bingqing Chen22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT  Ding Zhao11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT *This work was partially performed during PH’s internship at the Bosch Center for Artificial Intelligence.11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPTPH, WD, DZ are with Carnegie Mellon University, USA. 22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTJF, BC are with Bosch Center for Artificial Intelligence. Contact: peideh@andrew.cmu.edu
Abstract

Simulation is an indispensable tool in the development and testing of autonomous vehicles (AVs), offering an efficient and safe alternative to road testing by allowing the exploration of a wide range of scenarios. Despite its advantages, a significant challenge within simulation-based testing is the generation of safety-critical scenarios, which are essential to ensure that AVs can handle rare but potentially fatal situations. This paper addresses this challenge by introducing a novel generative framework, CaDRE, which is specifically designed for generating diverse and controllable safety-critical scenarios using real-world trajectories. Our approach optimizes for both the quality and diversity of scenarios by employing a unique formulation and algorithm that integrates real-world data, domain knowledge, and black-box optimization techniques. We validate the effectiveness of our framework through extensive testing in three representative types of traffic scenarios. The results demonstrate superior performance in generating diverse and high-quality scenarios with greater sample efficiency than existing reinforcement learning and sampling-based methods.

I INTRODUCTION

Simulation plays a pivotal role in the domain of autonomous driving, serving crucial functions in both training and evaluation [1, 2, 3, 4]. In contrast to the costly and time-consuming nature of on-road testing, simulation offers efficient feedback to developers, avoiding risky engagements in the physical world [5]. Furthermore, simulation enables the capability to incorporate various scenario sources, ranging from real-world logs and random perturbations to templates crafted by human experts. This versatility in scenario selection facilitates a comprehensive analysis of the performance of autonomous vehicles (AVs).

However, it is widely recognized that traffic scenarios in the real world exhibit a long-tail distribution, with normal scenarios constituting the majority and safety-critical scenarios occurring infrequently [6, 7]. Training AVs exclusively on these normal scenarios impedes the ability to generalize to critical situations, potentially leading to fatal accidents upon widespread deployment. During the development stage, evaluating AVs only in normal scenarios results in biased and incomplete assessments, as models may need to compromise slightly on performance in normal scenarios to improve robustness in safety-critical ones [8, 9, 10]. Consequently, there is an urgent need for the generation of safety-critical scenarios within simulations.

There are three principal challenges in generating safety-critical scenarios. The first challenge is realism, which requires the scenarios generated to be sufficiently realistic to occur in the real world. This realism is typically interpreted as the similarity between the distributions of real-world and generated scenarios [11, 12, 13]. To achieve this, algorithms often involve either modifying pre-existing normal scenarios [14, 15] or employing generative models to approximate the distribution of real-world scenarios and adjusting the model-derived samples accordingly [16, 17]. The second challenge is diversity, which demands that the generation algorithm cover a wide spectrum of scenarios rather than focusing on a single failure instance. Previous approaches utilizing adversarial attack  [14, 18] or reinforcement learning [19] tend to identify the most severe cases but are lacking in producing a diverse set of scenarios. The final challenge is in ensuring that generated scenarios are in alignment with specific factors or guidelines that affect scene variation, a concept referred to as controllability. These guidelines are often expressed through constraints [20], temporal logic [16], or language [17, 21], all requiring meticulous model architecture design to facilitate the integration of these guidelines.

In this paper, we introduce a generative framework CaDRE, which employs the Quality-Diversity (QD) formulation for the generation of safety-critical scenarios. Through a novel design, CaDRE addresses the above three challenges by integrating information from real-world data, domain knowledge, and black-box optimization and explicitly optimizing for both high-quality and diverse scenarios. Specifically, to maintain the realism of generated scenarios, we optimize the perturbations added to trajectories from real-world scenarios within defined constraints. Subsequently, we apply the QD algorithm to simultaneously explore and optimize continuous perturbation spaces efficiently. Finally, we achieve controllability by retrieving from archived scenarios according to the specific measure values defined by the user.

Refer to caption
Figure 1: Overview of the CaDRE framework.

The main contributions can be summarized below:

  • We propose CaDRE, a novel QD formulation for the generation of diverse and controllable safety-critical scenarios in autonomous driving.

  • We propose an occupancy-aware restart mechanism as a general extension to the QD algorithm family, which improves the exploration efficiency of the algorithms.

  • We conduct experiments on three representative real-world traffic scenario types: unprotected cross-turn, high-speed lane-change, and U-turn. The experimental results demonstrate that CaDRE can generate diverse and high-quality scenarios, with better sample-efficiency compared to both RL- and sampling-based methods.

II RELATED WORK

Safety-critical Scenario Generation. One significant component of autonomous driving simulation is the traffic model, which governs the behavior of the background vehicles, crucial to simulating real-world scenarios. TrafficSim [22] improves the generation process by using graph neural networks to extract interactions between vehicles. TrafficGen [11] proposes to generate the initial condition and sequential behavior of vehicles separately. ScenarioNet [23] further extends this framework to build a large-scale simulation platform that supports multiple open-source datasets.

Unlike the aforementioned works, which aim to generate realistic scenarios, we focus on the long-tail distribution, consisting of the safety-critical scenarios, to provide efficient evaluations of the safety of AVs [24, 25, 26]. Most of the existing literature in this category focuses on adversarial generation. L2C [19], MMG [27], and CausalAF [28] generate initial conditions for open-loop scenario generation using reinforcement learning. The methods in [14, 29, 30] optimize the trajectories of actors with black-box optimization to attack the ego vehicle. KING [15] and AdvDO [31] further assume access to differential dynamics models to improve the efficiency of finding safety-critical scenarios. Since adversarial attacks sacrifice the diversity and controllability of generated scenarios, imitation learning [32], retrieval-augmented generation [33], causality [34], and evolutionary algorithms [35] have also been explored. To use language as conditions, LCTGen [17] predefined an intermediate representation to bridge the large language model (LLM) [36] and the low-level trajectory generator, and CTG++ [37] uses LLMs to generate signal temporal logic to guide the sampling process of diffusion models. In this paper, we depart from the common practice of leveraging adversarial generation methods and instead focus on improving the diversity and controllability of generated samples through our novel use and extensions of Quality-Diversity algorithms.

Quality-Diversity Algorithms in Robotics. QD is a branch of optimization that finds a collection of high-performing, yet qualitatively different solutions [38, 39]. Specifically, QD optimizes an objective for each point in a measure space. Solving a QD problem in a continuous measure space requires infinite memory [39], so, in practice, the measure space is discretized into a finite set, and an archive is maintained to keep track of the best-known solutions over the finite set.

Given QD’s ability to find a collection of high-performing solutions for different contexts, it is well-suited for many robotics applications. In the pioneering work of [40], a behavior-performance map is learned to enable the robot to quickly find a compensatory behavior and adapt after damage. QD has also been used on problems, such as human-robot interaction [41, 42], robot manipulation [43, 44], locomotion [45, 46], and morphology design [47, 48].

Popular QD algorithms, e.g., MAP-Elites [38] and CMA-ME [49], are predicated on evolutionary strategies to implement their underlying search policies. The goal of this search is to find a solution for a particular parameter configuration and to update the archive set, accordingly. These QD methods initiate this process from random regions in the search with no regard for the density of the local neighborhood of solutions; this can be inefficient, due to the possibility of restarting from already-known regions. To alleviate these issues, we introduce a novel occupancy-aware restart (OAR) mechanism, providing a form of guidance for improved coverage and efficiency during exploration. We assess the value of our novel OAR mechanism in terms of search efficiency and convergence through comparisons with the above QD algorithms and multi-particle exploration mechanisms used in reinforcement learning. To the best of our knowledge, we are the first to formulate QD for the challenging problem of safety-critical scenario generation in autonomous driving; our approach enables us to generate a map of diverse and high-quality scenes whose parameters vary smoothly along the dimensions defined by expressive measure spaces.

III METHODOLOGY

Our method, Controllable and Diverse Generation of Safety-Critical Driving Scenarios using REal-world trajectories (CaDRE), integrates real-world data, domain knowledge, and black-box optimization techniques. As illustrated in Fig. 1, for each iteration, CaDRE maintains a grid archive of generated scenarios. First, it uses the QD algorithm to update the distribution from which the perturbations to the real-world trajectories are sampled. Then CaDRE simulates the perturbations to obtain diverse behavior measures and updates the archive according to the simulation results. Finally, we obtain an archive that contains thousands of critical scenarios, each with different behaviors according to the measure functions we defined using domain knowledge.

Let 𝒙ti2,ψti[π,π]formulae-sequencesuperscriptsubscript𝒙𝑡𝑖superscript2superscriptsubscript𝜓𝑡𝑖𝜋𝜋\bm{x}_{t}^{i}\in\mathbb{R}^{2},\psi_{t}^{i}\in[-\pi,\pi]bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ [ - italic_π , italic_π ] and vtisuperscriptsubscript𝑣𝑡𝑖v_{t}^{i}\in\mathbb{R}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R be the ground-plane coordinate, orientation, and speed of the world frame of the i𝑖iitalic_i-th vehicle agent at time t𝑡titalic_t. The ego vehicle, with index i=0𝑖0i=0italic_i = 0, is the vehicle for which we want to generate critical scenarios. We denote the state of the vehicle as 𝒔t={𝒙ti,ψti,vti}i=0Nsubscript𝒔𝑡superscriptsubscriptsuperscriptsubscript𝒙𝑡𝑖superscriptsubscript𝜓𝑡𝑖superscriptsubscript𝑣𝑡𝑖𝑖0𝑁\bm{s}_{t}=\left\{\bm{x}_{t}^{i},\psi_{t}^{i},v_{t}^{i}\right\}_{i=0}^{N}bold_italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, where N𝑁Nitalic_N is the number of background agents. We define a specific traffic scenario as a sequence of these states 𝒮={𝒔t}t=0T𝒮superscriptsubscriptsubscript𝒔𝑡𝑡0𝑇\mathcal{S}=\left\{\bm{s}_{t}\right\}_{t=0}^{T}caligraphic_S = { bold_italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, where T𝑇Titalic_T is a fixed time horizon. We initialize a specific scenario from a real-world dataset that contains only naturalistic driving scenarios.

Safety-Critical Perturbation. We perturb the trajectory of one background vehicle indexed by i[1,,N]𝑖1𝑁i\in[1,\ldots,N]italic_i ∈ [ 1 , … , italic_N ] to generate safety-critical scenarios. We first recover the action sequence {𝒂ti}t=0T1subscriptsuperscriptsuperscriptsubscript𝒂𝑡𝑖𝑇1𝑡0\left\{\bm{a}_{t}^{i}\right\}^{T-1}_{t=0}{ bold_italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT from {𝒔ti}t=0Tsubscriptsuperscriptsuperscriptsubscript𝒔𝑡𝑖𝑇𝑡0\left\{\bm{s}_{t}^{i}\right\}^{T}_{t=0}{ bold_italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT, assuming a kinematic bicycle model:

ddt[xyψv]=[vcos(ψ)vsin(ψ)vtan(ψ)/La],𝑑𝑑𝑡matrix𝑥𝑦𝜓𝑣matrix𝑣𝜓𝑣𝜓𝑣𝜓𝐿𝑎\frac{d}{dt}\begin{bmatrix}x\\ y\\ \psi\\ v\end{bmatrix}=\begin{bmatrix}v\cos(\psi)\\ v\sin(\psi)\\ v\tan(\psi)/L\\ a\end{bmatrix},divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG [ start_ARG start_ROW start_CELL italic_x end_CELL end_ROW start_ROW start_CELL italic_y end_CELL end_ROW start_ROW start_CELL italic_ψ end_CELL end_ROW start_ROW start_CELL italic_v end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_v roman_cos ( italic_ψ ) end_CELL end_ROW start_ROW start_CELL italic_v roman_sin ( italic_ψ ) end_CELL end_ROW start_ROW start_CELL italic_v roman_tan ( italic_ψ ) / italic_L end_CELL end_ROW start_ROW start_CELL italic_a end_CELL end_ROW end_ARG ] , (1)

where L𝐿Litalic_L is the wheelbase. Each action consists of acceleration and steering input, i.e., 𝒂ti:=[ati,δti]assignsuperscriptsubscript𝒂𝑡𝑖superscriptsubscript𝑎𝑡𝑖superscriptsubscript𝛿𝑡𝑖\bm{a}_{t}^{i}:=[a_{t}^{i},\delta_{t}^{i}]bold_italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT := [ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ]. A new trajectory can be generated by 1) applying a sequence of bounded perturbations {Δ𝒂ti}t=0T1subscriptsuperscriptΔsuperscriptsubscript𝒂𝑡𝑖𝑇1𝑡0\left\{\Delta\bm{a}_{t}^{i}\right\}^{T-1}_{t=0}{ roman_Δ bold_italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT to the recovered action sequence, and 2) unrolling the kinematics model from 𝒔0isuperscriptsubscript𝒔0𝑖\bm{s}_{0}^{i}bold_italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT using Eqn. 1. We then parameterize each safety-critical scenario with 𝜽={Δ𝒂0i,,Δ𝒂T1i}T×2𝜽Δsuperscriptsubscript𝒂0𝑖Δsuperscriptsubscript𝒂𝑇1𝑖superscript𝑇2\bm{\theta}=\left\{\Delta\bm{a}_{0}^{i},\ldots,\Delta\bm{a}_{T-1}^{i}\right\}% \in\mathbb{R}^{T\times 2}bold_italic_θ = { roman_Δ bold_italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , … , roman_Δ bold_italic_a start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT italic_T × 2 end_POSTSUPERSCRIPT.

Black-Box Ego Policy. We follow [15, 31] and assume that the ego vehicle is reactive to nearby vehicles and tries to follow the original trajectory. The problem is formulated as a black-box optimization; CaDRE  does not require access to the reactive policy and would work with any other ego policies.

III-A Quality-Diversity Formulation for Scenario Generation

Inspired by previous work [41, 39], we formulate the problem of generating a diverse set of safety-critical driving scenarios as a QD problem. First, we define an objective function f:T×2:𝑓superscript𝑇2f:\mathbb{R}^{T\times 2}\rightarrow\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_T × 2 end_POSTSUPERSCRIPT → blackboard_R to quantify the safety-critical level. We further define K measure functions mk:T×2:subscript𝑚𝑘superscript𝑇2m_{k}:\mathbb{R}^{T\times 2}\rightarrow\mathbb{R}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_T × 2 end_POSTSUPERSCRIPT → blackboard_R, jointly represented as 𝒎:T×2K:𝒎superscript𝑇2superscript𝐾\bm{m}:\mathbb{R}^{T\times 2}\rightarrow\mathbb{R}^{K}bold_italic_m : blackboard_R start_POSTSUPERSCRIPT italic_T × 2 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, which are a set of user-defined functions to quantify aspects of the scenario that we aim to diversify. We denote =𝒎(T×2)K𝒎superscript𝑇2superscript𝐾\mathcal{M}=\bm{m}(\mathbb{R}^{T\times 2})\subseteq\mathbb{R}^{K}caligraphic_M = bold_italic_m ( blackboard_R start_POSTSUPERSCRIPT italic_T × 2 end_POSTSUPERSCRIPT ) ⊆ blackboard_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT as the measure space formed by the range of 𝒎𝒎\bm{m}bold_italic_m. Because f𝑓fitalic_f evaluates the quality of a scenario 𝜽𝜽\bm{\theta}bold_italic_θ, the goal of the QD problem is to find, for each s𝒮𝑠𝒮s\in\mathcal{S}italic_s ∈ caligraphic_S, a scenario 𝜽𝜽\bm{\theta}bold_italic_θ, such that 𝒎(𝜽)=s𝒎𝜽𝑠\bm{m}(\bm{\theta})=sbold_italic_m ( bold_italic_θ ) = italic_s and that f(𝜽)𝑓𝜽f(\bm{\theta})italic_f ( bold_italic_θ ) is maximized (Eqn. 2):

max\displaystyle\max\quadroman_max f(𝜽)𝑓𝜽\displaystyle f(\bm{\theta})italic_f ( bold_italic_θ ) (2)
s.t. 𝒎(𝜽)=s,s𝒮.formulae-sequence𝒎𝜽𝑠for-all𝑠𝒮\displaystyle\bm{m}(\bm{\theta})=s,\;\forall s\in\mathcal{S}.bold_italic_m ( bold_italic_θ ) = italic_s , ∀ italic_s ∈ caligraphic_S .

In practice, we discretize \mathcal{M}caligraphic_M into a finite number of M𝑀Mitalic_M cells and solve the simplified version of the problem:

max𝜽1,,𝜽Mn=1Mf(𝜽n).subscriptsubscript𝜽1subscript𝜽𝑀superscriptsubscript𝑛1𝑀𝑓subscript𝜽𝑛\max_{\bm{\theta}_{1},\ldots,\bm{\theta}_{M}}\sum_{n=1}^{M}f(\bm{\theta}_{n}).roman_max start_POSTSUBSCRIPT bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_θ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_f ( bold_italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) . (3)

With a slight abuse of notation, we will use f𝑓fitalic_f to denote the objective value and 𝒎𝒎\bm{m}bold_italic_m to denote the values of the measure function. we also denote the archive as M𝑀Mitalic_M, and we can retrieve the scenarios from the archive by M[𝒎]𝑀delimited-[]𝒎M[\bm{m}]italic_M [ bold_italic_m ]. With properly defined objective and measure functions, we can optimize a diverse population of safety-critical scenarios and retrieve individual scenarios in a controllable manner by asking for specific measure values 𝒎𝒎\bm{m}bold_italic_m. We build a lightweight traffic scenario simulator Sim(𝒮,𝜽,i)𝑆𝑖𝑚𝒮𝜽𝑖Sim(\mathcal{S},\bm{\theta},i)italic_S italic_i italic_m ( caligraphic_S , bold_italic_θ , italic_i ), which outputs the objective value f𝑓fitalic_f and the measure values 𝒎𝒎\bm{m}bold_italic_m, given the original scenario 𝒮𝒮\mathcal{S}caligraphic_S, perturbation 𝜽𝜽\bm{\theta}bold_italic_θ, and the index of the perturbed vehicle i𝑖iitalic_i.

Input: Real-world scenario 𝒮𝒮\mathcal{S}caligraphic_S, index of the perturbed background vehicle i𝑖iitalic_i, traffic simulator Sim𝑆𝑖𝑚Simitalic_S italic_i italic_m, batch size B𝐵Bitalic_B, an empty grid archive M𝑀Mitalic_M
Output: An grid archive M𝑀Mitalic_M containing diverse safety-critical scenarios 𝒮csubscript𝒮𝑐\mathcal{S}_{c}caligraphic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT
Initialize emitter e𝑒eitalic_e
Recover {𝒂ti}t=0T1subscriptsuperscriptsuperscriptsubscript𝒂𝑡𝑖𝑇1𝑡0\left\{\bm{a}_{t}^{i}\right\}^{T-1}_{t=0}{ bold_italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT from {𝒔ti}t=0Tsubscriptsuperscriptsuperscriptsubscript𝒔𝑡𝑖𝑇𝑡0\left\{\bm{s}_{t}^{i}\right\}^{T}_{t=0}{ bold_italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT in 𝒮𝒮\mathcal{S}caligraphic_S
for iter = 1, normal-…\ldots, total_iter do
       {𝜽b}b=1B𝒩(e.μ,e.C)\{\bm{\theta}_{b}\}^{B}_{b=1}\sim\mathcal{N}(e.\mu,e.C){ bold_italic_θ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b = 1 end_POSTSUBSCRIPT ∼ caligraphic_N ( italic_e . italic_μ , italic_e . italic_C )
       for b=1,,B𝑏1normal-…𝐵b=1,\ldots,Bitalic_b = 1 , … , italic_B do
            {fb,𝒎b}Sim(𝒮,𝜽b,i)f_{b},\bm{m}_{b}\}\leftarrow Sim(\mathcal{S},\bm{\theta}_{b},i)italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , bold_italic_m start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT } ← italic_S italic_i italic_m ( caligraphic_S , bold_italic_θ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_i )
      Unpack parents𝑝𝑎𝑟𝑒𝑛𝑡𝑠parentsitalic_p italic_a italic_r italic_e italic_n italic_t italic_s, sampling mean μ𝜇\muitalic_μ, covariance matrix C𝐶Citalic_C, and parameter set P𝑃Pitalic_P from e.
       for b=1,,B𝑏1normal-…𝐵b=1,\ldots,Bitalic_b = 1 , … , italic_B do
             if M[𝐦b]𝑀delimited-[]subscript𝐦𝑏M[\bm{m}_{b}]italic_M [ bold_italic_m start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ] is empty then
                   ΔbfbsubscriptΔ𝑏subscript𝑓𝑏\Delta_{b}\leftarrow f_{b}roman_Δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ← italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT
                   Flag that 𝜽𝜽\bm{\theta}bold_italic_θ discovered a new cell
                   Add 𝜽bsubscript𝜽𝑏\bm{\theta}_{b}bold_italic_θ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT to parents𝑝𝑎𝑟𝑒𝑛𝑡𝑠parentsitalic_p italic_a italic_r italic_e italic_n italic_t italic_s
             else if fb>M[𝐦b].fformulae-sequencesubscript𝑓𝑏𝑀delimited-[]subscript𝐦𝑏𝑓f_{b}>M[\bm{m}_{b}].fitalic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT > italic_M [ bold_italic_m start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ] . italic_f then
                   ΔbfbM[𝒎b].fformulae-sequencesubscriptΔ𝑏subscript𝑓𝑏𝑀delimited-[]subscript𝒎𝑏𝑓\Delta_{b}\leftarrow f_{b}-M[\bm{m}_{b}].froman_Δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ← italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT - italic_M [ bold_italic_m start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ] . italic_f
                   Add 𝜽bsubscript𝜽𝑏\bm{\theta}_{b}bold_italic_θ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT to parents𝑝𝑎𝑟𝑒𝑛𝑡𝑠parentsitalic_p italic_a italic_r italic_e italic_n italic_t italic_s
      if parents absent\neq\varnothing≠ ∅ then
             Sort parents𝑝𝑎𝑟𝑒𝑛𝑡𝑠parentsitalic_p italic_a italic_r italic_e italic_n italic_t italic_s by (newCell, ΔbsubscriptΔ𝑏\Delta_{b}roman_Δ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT)
             Update μ,C,P𝜇𝐶𝑃\mu,C,Pitalic_μ , italic_C , italic_P according to parents𝑝𝑎𝑟𝑒𝑛𝑡𝑠parentsitalic_p italic_a italic_r italic_e italic_n italic_t italic_s
             parents𝑝𝑎𝑟𝑒𝑛𝑡𝑠parents\leftarrow\varnothingitalic_p italic_a italic_r italic_e italic_n italic_t italic_s ← ∅
            
      else
            Occupancy-aware restart from an elite in M𝑀Mitalic_M
Algorithm 1 CaDRE: Controllable and Diverse Generation of Safety-Critical Driving Scenarios

III-B Design of Objective and Measure Functions

Objective Function. The objective function f𝑓fitalic_f quantifies the safety-critical level, motivated by prior work on safety-critical scenario generation:

f(𝜽):={1,if vehicle i collides with the ego vehicle0,if vehicle i collides with background vehiclesexp(mintd(𝒙t0,𝒙ti)),otherwise,assign𝑓𝜽cases1if vehicle i collides with the ego vehicle𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒0if vehicle i collides with background vehicles𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒subscript𝑡𝑑superscriptsubscript𝒙𝑡0superscriptsubscript𝒙𝑡𝑖otherwise𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒f(\bm{\theta}):=\begin{cases}1,\quad\text{if vehicle $i$ collides with the ego% vehicle}\\ 0,\quad\text{if vehicle $i$ collides with background vehicles}\\ \exp(-\min_{t}d(\bm{x}_{t}^{0},\bm{x}_{t}^{i})),\quad\text{otherwise},\\ \end{cases}italic_f ( bold_italic_θ ) := { start_ROW start_CELL 1 , if vehicle italic_i collides with the ego vehicle end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 , if vehicle italic_i collides with background vehicles end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL roman_exp ( - roman_min start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) , otherwise , end_CELL start_CELL end_CELL end_ROW (4)

where d(,)𝑑d(\cdot,\cdot)italic_d ( ⋅ , ⋅ ) is the l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distance.

Measure Functions. The measure functions are essential to capture different aspects of critical scenarios. We propose three measure functions to define the diverse behavior of perturbed vehicles. These measure functions collectively enable the definition and evaluation of a wide range of safety-critical scenarios, focusing on essential factors such as perturbation efforts (m1subscript𝑚1m_{1}italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT), urgency of response (m2subscript𝑚2m_{2}italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT), and collision behavior (m3subscript𝑚3m_{3}italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT). Here, m1subscript𝑚1m_{1}italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT measures the mean magnitude of the steering perturbation. It reflects how much the generated trajectory would deviate from the original trajectory:

m1=1timpactt=0timpact1|δti|.subscript𝑚11subscript𝑡impactsuperscriptsubscript𝑡0subscript𝑡impact1superscriptsubscript𝛿𝑡𝑖m_{1}=\frac{1}{t_{\text{impact}}}\sum_{t=0}^{t_{\text{impact}}-1}\left|\delta_% {t}^{i}\right|.italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUBSCRIPT impact end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT impact end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT | italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT | . (5)

Next, m2subscript𝑚2m_{2}italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT measures the normalized impact time; it helps categorize scenarios based on the urgency of the response required, aiding in the development of time-critical decision-making algorithms for AVs:

m2=timpact/T.subscript𝑚2subscript𝑡impact𝑇m_{2}=t_{\text{impact}}/T.italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT impact end_POSTSUBSCRIPT / italic_T . (6)

Finally, m3subscript𝑚3m_{3}italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT measures the impact angle relative to the body frame of the vehicle ego. It allows for the evaluation of how well autonomous driving systems can recognize and react to threats from various directions, enhancing their ability to prevent accidents through appropriate maneuvering or braking:

m3=atan2(Rψt0(𝒙ti𝒙t0)T),subscript𝑚3𝑎𝑡𝑎𝑛2subscript𝑅superscriptsubscript𝜓𝑡0superscriptsuperscriptsubscript𝒙𝑡𝑖superscriptsubscript𝒙𝑡0𝑇m_{3}=atan2(R_{\psi_{t}^{0}}(\bm{x}_{t}^{i}-\bm{x}_{t}^{0})^{T}),italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = italic_a italic_t italic_a italic_n 2 ( italic_R start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) , (7)

where Rψt0subscript𝑅superscriptsubscript𝜓𝑡0R_{\psi_{t}^{0}}italic_R start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is the rotation matrix, t=timpact𝑡subscript𝑡impactt=t_{\text{impact}}italic_t = italic_t start_POSTSUBSCRIPT impact end_POSTSUBSCRIPT, and atan2𝑎𝑡𝑎𝑛2atan2italic_a italic_t italic_a italic_n 2 is the 2-argument arctangent function. The x𝑥xitalic_x-axis of the body frame is pointing to the front of the vehicle, and the y𝑦yitalic_y-axis is pointing to the left. If there is no collision, we assume timpact=argmintd(𝒙t0,𝒙ti)subscript𝑡impactsubscript𝑡𝑑superscriptsubscript𝒙𝑡0superscriptsubscript𝒙𝑡𝑖t_{\text{impact}}=\arg\min_{t}d(\bm{x}_{t}^{0},\bm{x}_{t}^{i})italic_t start_POSTSUBSCRIPT impact end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). Note that the time horizon for different vehicles can be different as some vehicles may appear or leave the scene at different times in real-world scenarios.

We adopt a variant of a QD algorithm, namely Covariance Matrix Adaptation MAP-Elites (CMA-ME) [49], to find both a higher quality and a wider diversity of safety-critical scenarios. The algorithm is adapted from CMA-ME (Algorithm 1). The key difference between QD algorithms such as CMA-ME and evolutionary strategies such as CMA-ES is that CMA-ME employs the archiving mechanism to maintain diversity [49]. Another difference is that CMA-ME adjusts the parent ranking rules that update the sampling distribution to maximize the likelihood of archive improvement; It ranks solutions filling empty cells higher than those replacing existing ones.

III-C Occupancy-Aware Restart

Existing QD algorithms [49] restart from a random elite in the archive when there is no improvement in the archive. However, it is not efficient since searching from the elites whose neighboring cells are empty is more beneficial to the exploration than from densely occupied regions in general. To improve exploration efficiency, we propose Occupancy-Aware Restart (OAR), a restart mechanism that considers the occupancy rate of neighboring cells.

As illustrated in Fig. 2, OAR assigns a higher probability to elites with more empty neighboring cells. More specifically, given the neighbor empty rate of L𝐿Litalic_L elites r1,,rLsubscript𝑟1subscript𝑟𝐿r_{1},\ldots,r_{L}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT and the temperature T𝑇Titalic_T, the softmax probability of restarting from elite i𝑖iitalic_i is computed by:

pi=eri/Tj=1Lerj/T.subscript𝑝𝑖superscript𝑒subscript𝑟𝑖𝑇superscriptsubscript𝑗1𝐿superscript𝑒subscript𝑟𝑗𝑇p_{i}=\frac{e^{r_{i}/T}}{\sum_{j=1}^{L}e^{r_{j}/T}}.italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_T end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / italic_T end_POSTSUPERSCRIPT end_ARG . (8)

As T+𝑇T\rightarrow+\inftyitalic_T → + ∞, OAR degenerates to the uniform sampling. With a lower T𝑇Titalic_T, OAR assigns a higher probability to those elites who have more empty neighbors. For efficient implementation, we use a 3D convolution kernel to compute the number of empty cells around each elite.

Refer to caption
Figure 2: Illustration of Occupancy-Aware Restart.
TABLE I: The final performance of coverage, mean objective, QD score. We report the mean and variance over 5 perturbed vehicles for each scene. The QD score is shown in multiples of 1e31𝑒31e31 italic_e 3.

Unprotected cross-turn High-speed lane-change U-turn Method Coverage (\uparrow) Mean Obj (\uparrow) QD Score (\uparrow) Coverage (\uparrow) Mean Obj (\uparrow) QD Score (\uparrow) Coverage (\uparrow) Mean Obj (\uparrow) QD Score (\uparrow) Random 0.140±plus-or-minus\pm±0.021 0.499±plus-or-minus\pm±0.123 0.285±plus-or-minus\pm±0.098 0.310±plus-or-minus\pm±0.158 0.310±plus-or-minus\pm±0.158 0.209±plus-or-minus\pm±0.131 0.188±plus-or-minus\pm±0.066 0.381±plus-or-minus\pm±0.134 0.320±plus-or-minus\pm±0.151 CMA-ES 0.182±plus-or-minus\pm±0.033 0.672±plus-or-minus\pm±0.076 0.489±plus-or-minus\pm±0.090 0.163±plus-or-minus\pm±0.018 0.540±plus-or-minus\pm±0.143 0.347±plus-or-minus\pm±0.086 0.228±plus-or-minus\pm±0.118 0.447±plus-or-minus\pm±0.228 0.502±plus-or-minus\pm±0.253 REINFORCE 0.210±plus-or-minus\pm±0.031 0.649±plus-or-minus\pm±0.115 0.551±plus-or-minus\pm±0.155 0.488±plus-or-minus\pm±0.144 0.488±plus-or-minus\pm±0.144 0.472±plus-or-minus\pm±0.161 0.286±plus-or-minus\pm±0.010 0.641±plus-or-minus\pm±0.117 0.731±plus-or-minus\pm±0.117 SVPG 0.219±plus-or-minus\pm±0.032 0.607±plus-or-minus\pm±0.130 0.553±plus-or-minus\pm±0.155 0.438±plus-or-minus\pm±0.096 0.438±plus-or-minus\pm±0.096 0.437±plus-or-minus\pm±0.166 0.290±plus-or-minus\pm±0.020 0.577±plus-or-minus\pm±0.120 0.665±plus-or-minus\pm±0.110 CaDRE (ours) 0.565±plus-or-minus\pm±0.054 0.829±plus-or-minus\pm±0.062 1.884±plus-or-minus\pm±0.309 0.541±plus-or-minus\pm±0.079 0.627±plus-or-minus\pm±0.142 1.375±plus-or-minus\pm±0.436 0.542±plus-or-minus\pm±0.094 0.793±plus-or-minus\pm±0.073 1.699±plus-or-minus\pm±0.219

Refer to caption
Figure 3: Coverage and QD score v.s. the number of samples. The solid lines represent the mean, and the shaded area presents the standard deviation over 5 perturbed vehicles.

IV EXPERIMENTS

IV-A Experimental Setup

Real-world Trajectories. We pick three representative scenarios from nuPlan v1.1 [50]: unprotected cross-turn, high-speed lane-change, and U-turn. All scenarios have a time horizon of 150150150150 frames @ 10Hz10𝐻𝑧10Hz10 italic_H italic_z and are down-sampled to 5Hz in our experiments.

Reactive Ego Policy. We implement a rule-based ego policy: The ego vehicle will follow the reference trajectory. However, if there is a vehicle within 5m5𝑚5m5 italic_m and [π/4,π/4]𝜋4𝜋4[-\pi/4,\pi/4][ - italic_π / 4 , italic_π / 4 ] of the body frame of the ego, the ego vehicle will brake at 7m/s27𝑚superscript𝑠2-7m/s^{2}- 7 italic_m / italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and maximum steering angle ±π/8plus-or-minus𝜋8\pm\pi/8± italic_π / 8 depending on the relative position of the vehicle w.r.t. the body frame of the ego. Recall that the problem is formulated as a black-box optimization, and CaDRE is agnostic to the ego agent’s policy.

Selection of Perturbed Vehicles and Perturbation Range. To ensure effective perturbation, we use a simple heuristic to select which vehicles to perturb: the top five background vehicles that have the smallest average distance to the ego vehicle. Acceleration perturbation is between ±2plus-or-minus2\pm 2± 2, and steering perturbation range is between ±π/8plus-or-minus𝜋8\pm\pi/8± italic_π / 8.

Evaluation Metrics. We focus on three criteria that measure the quality and diversity of the archive, which are standard metrics in the QD literature [49, 39, 51, 52].

  • Coverage [0,1]absent01\in[0,1]∈ [ 0 , 1 ]: Proportion of cells in the archive that have an elite.

  • Mean objective [0,1]absent01\in[0,1]∈ [ 0 , 1 ]: Mean objective value of elites in the archive.

  • QD score [0,4000]absent04000\in[0,4000]∈ [ 0 , 4000 ]: Sum of the objective values of all elites in the archive. The theoretical maximum value of 4000400040004000 is due to our objective f[0,1]𝑓01f\in[0,1]italic_f ∈ [ 0 , 1 ], and we discretize the measure space into 10×20×2010202010\times 20\times 2010 × 20 × 20 grid.

Baselines. We study a mixture of sampling- and RL-based methods that have been employed by the existing literature.

  • Random Search (Random): uniformly-random sample from the solution space.

  • CMA-ES [53]: CMA-ES iteratively updates a population of solutions based on their fitness, using statistical information from previous generations to adaptively adjust the search distribution towards optimal regions of the solution space, on which CMA-ME is based but without QD.

  • Multi-particle REINFORCE [54]: policy gradient method employed by previous work [19, 27]. We set the number of particles to be the same as the batch size (36363636) of CMA-ME employed by our algorithm.

  • Stein Variational Policy Gradient (SVPG) [55]: SVPG is an improved version of multi-particle REINFORCE. SVPG introduces a maximum entropy policy optimization framework that explicitly encourages diverse solutions and better exploration. Similar to multi-particle REINFORCE, we set the number of particles to be the same as the batch size of CMA-ME.

We use the QD algorithm library pyribs [39] to implement our framework. We aim to answer the following questions in our experimental study:

  • How does CaDRE compare with baseline methods in terms of the evaluation metrics and sample efficiency?

  • Is OAR effective in improving exploration?

  • Can we retrieve diverse scenarios generated by CaDRE in a controllable manner?

IV-B Sample-efficiency Compared with Baseline Methods

Refer to caption
Figure 4: Histograms of measure values. We visualize the final archive of the perturbed background vehicle with the highest QD score in the unprotected cross-turn. The solid line is the kernel density estimate of the true distribution. Note that the Gaussian kernel may introduce some distortions since the true distribution is bounded.
Refer to caption
Figure 5: Visualization of generated trajectories. The leftmost column shows the original unperturbed scenarios. The numbers below are the measure values [m1,m2,m3]subscript𝑚1subscript𝑚2subscript𝑚3[m_{1},m_{2},m_{3}][ italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ], representing the mean steering perturbation, impact time, and impact angle, respectively.
Refer to caption
Figure 6: Visualization of final archives. A darker color means a higher objective value. Transparent cells mean we cannot find scenarios. We visualize the perturbed vehicles that have the highest QD score for each scenario respectively.

The coverage and QD score v.s. samples are shown in Figure 3. CaDRE outperforms all baselines with significant margins in three different scenarios, which demonstrates that CaDRE discovers not only high-quality but also diverse scenarios much faster than Random Search, SVPG, and REINFORCE, with the same number of samples. CaDRE utilizes the Covariance Matrix Adaptation (CMA) strategy, which adapts the search distribution over generations to increase the likelihood of sampling promising areas of the solution space. This adaptation is based on information from previous generations, allowing CaDRE to focus its sampling on regions with higher potential for high-quality solutions. Unlike Random Search, which samples uniformly across the solution space without learning from previous samples, CaDRE dynamically narrows its search to more promising regions. CMA-ES, despite being on which CMA-ME is based, shows a completely different purpose, which is to the likelihood of increasing objective and, therefore, quickly converges to a single optimum. SVPG, and REINFORCE, while more directed than Random Search, may still struggle with efficiently exploring complex problem spaces due to their focus on gradient-based optimization.

Table I shows the final performance of coverage, mean objective, and QD score. With the same number of samples in unprotected cross-turn, CaDRE achieves 158.0%percent158.0158.0\%158.0 % more coverage, 36.6%percent36.636.6\%36.6 % higher mean objective, leading to a 240.7%percent240.7240.7\%240.7 % improvement in QD score than the best-performing baseline SVPG. It again highlights the superior exploration and exploitation capability of CaDRE compared to the baselines. Table II shows the ablation of OAR in the high-speed lane-change scenario. OAR improves the QD score of individual vehicles by a maximum of 33.0%percent33.033.0\%33.0 %, which demonstrates the effectiveness of OAR.

TABLE II: QD score of occupancy-aware restart with different temperatures. The QD score is shown in multiples of 1e31𝑒31e31 italic_e 3. We include the percentage improvement w.r.t 1/T=01𝑇01/T=01 / italic_T = 0 in parentheses.

Index 1/T=01𝑇01/T=01 / italic_T = 0 1/T=51𝑇51/T=51 / italic_T = 5 1/T=101𝑇101/T=101 / italic_T = 10 1 1.808 1.855 (2.6%) 2.026 (12.0%) 2 0.519 0.639 (22.9%) 0.691 (33.0%) 3 1.277 1.133 (-11.3%) 1.229 (-3.7%) 4 1.393 1.515 (8.8%) 1.367 (-1.9%) 5 1.444 1.299 (-10.0%) 1.563 (8.2%)

IV-C Analysis of the Generated Safety-Critical Scenarios

Visualization of Archives. We visualize the final archives in Fig. 6. It is observed that the proposed CaDRE leads to a much higher occupancy as well as mean objective in the final archives than the baseline SVPG. The main reason is that CaDRE explicitly encourages sustained exploration throughout optimization. CaDRE employs CMA-ME, which is particularly adept at exploring complex landscapes and finding a large amount of high-quality scenarios in different measures. Although the repulsive force in SVPG indeed introduces diversity among the particles to avoid premature convergence to local optima, the primary focus remains on optimizing a solution rather than explicitly seeking out diverse solutions across a range of measures.

Note that some cells are still unoccupied even for CaDRE. We hypothesize that it is due to the infeasibility of finding scenarios, which is induced by specific combinations of measure values, the vehicle states in the original scenarios, and the kinematics constraints. For example, it is extremely difficult to find a solution with a short impact time and a small impact angle (hitting from the front) in the unprotected cross-turn scenario, since there is no background vehicle starting near the front of the ego vehicle.

Distribution of Measure Values. Figure 4 visualizes the distribution of the measure function values. Compared to SVPG, CaDRE generates a denser and wider range of measure function values. However, both methods struggle to find safety-critical scenarios with very little steering perturbation, which is reasonable as the original scenarios only contain safe and regular traffic, and the perturbation is bounded.

Visualization of Generated Scenarios. In Fig. 5, we visualize five generated scenarios for the unprotected cross-turn, high-speed lane-change, and U-turn, respectively. The visualization shows that we are able to retrieve diverse critical scenarios in a controllable manner. For example, in the unprotected cross-turn, we can control the perturbed vehicle hitting the right side of the ego vehicle by steering a little bit from the original trajectory or hitting the left side by overtaking from the left, simply by asking for different combinations of measure function values [m1,m2,m3]subscript𝑚1subscript𝑚2subscript𝑚3[m_{1},m_{2},m_{3}][ italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] in the archive.

IV-D Limitations and Future Directions

Although CaDRE has demonstrated the ability to generate diverse and controllable scenarios with superior sample efficiency, it is not without its limitations. First, CaDRE does not consider the lane information and road conditions such as barriers. It could generate scenarios that are kinematically feasible but unlikely in real life, such as going through the median of the highway. Second, CaDRE only perturbs one of the background vehicles. However, in the real world, there exist some critical scenarios induced by more than one vehicle or not directly caused by a collision with the perturbed vehicle. For example, a background vehicle makes a lane change to avoid hitting another vehicle breaking in front, thus hitting the ego vehicle from the side. It is a promising direction to extend CaDRE to consider road information and perturb more than one vehicle or one vehicle that indirectly causes the collision.

V CONCLUSIONS

In this work, we develop a framework CaDRE for generating safety-critical scenarios. As a variant of the QD algorithm, CaDRE enhances the diversity and controllability of the scenario generation process, thus providing an effective instrument for the simulation-based assessment of autonomous vehicles. We conduct extensive experiments on three representative scenarios: unprotected cross-turn, high-speed lane-change, and U-turn. The experimental results show that CaDRE can generate and retrieve diverse and high-quality scenarios with better sample efficiency compared with both RL- and sampling-based methods.

ACKNOWLEDGMENT

This work was partially performed during PH’s internship at the Bosch Center for Artificial Intelligence; we thank Bosch for the use of computing resources. The authors additionally thank Uksang Yoo and Benjamin Stoler for valuable conversations. The authors gratefully acknowledge the support from the National Science Foundation under grants CNS-2047454.

References

  • [1] C. Gulino, J. Fu, W. Luo, G. Tucker, E. Bronstein, Y. Lu, J. Harb, X. Pan, Y. Wang, X. Chen et al., “Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  • [2] J. Herman, J. Francis, S. Ganju, B. Chen, A. Koul, A. Gupta, A. Skabelkin, I. Zhukov, M. Kumskoy, and E. Nyberg, “Learn-to-race: A multimodal control environment for autonomous racing,” in proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9793–9802.
  • [3] S. H. Park, G. Lee, J. Seo, M. Bhat, M. Kang, J. Francis, A. Jadhav, P. P. Liang, and L.-P. Morency, “Diverse and admissible trajectory forecasting through multimodal context understanding,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16.   Springer, 2020, pp. 282–298.
  • [4] M. Xu, Z. Liu, P. Huang, W. Ding, Z. Cen, B. Li, and D. Zhao, “Trustworthy reinforcement learning against intrinsic vulnerabilities: Robustness, safety, and generalizability,” arXiv preprint arXiv:2209.08025, 2022.
  • [5] P. Huang, X. Zhang, Z. Cao, S. Liu, M. Xu, W. Ding, J. Francis, B. Chen, and D. Zhao, “What went wrong? closing the sim-to-real gap via differentiable causal discovery,” in Conference on Robot Learning.   PMLR, 2023, pp. 734–760.
  • [6] W. Ding, C. Xu, M. Arief, H. Lin, B. Li, and D. Zhao, “A survey on safety-critical driving scenario generation—a methodological perspective,” IEEE Transactions on Intelligent Transportation Systems, 2023.
  • [7] B. Stoler, I. Navarro, M. Jana, S. Hwang, J. Francis, and J. Oh, “Safeshift: Safety-informed distribution shifts for robust trajectory prediction in autonomous driving,” arXiv preprint arXiv:2309.08889, 2023.
  • [8] E. Bronstein, S. Srinivasan, S. Paul, A. Sinha, M. O’Kelly, P. Nikdel, and S. Whiteson, “Embedding synthetic off-policy experience for autonomous driving via zero-shot curricula,” in Conference on Robot Learning.   PMLR, 2023, pp. 188–198.
  • [9] P. Huang, M. Xu, F. Fang, and D. Zhao, “Robust reinforcement learning as a stackelberg game via adaptively-regularized adversarial training,” arXiv preprint arXiv:2202.09514, 2022.
  • [10] M. Xu, P. Huang, Y. Niu, V. Kumar, J. Qiu, C. Fang, K.-H. Lee, X. Qi, H. Lam, B. Li et al., “Group distributionally robust reinforcement learning with hierarchical latent variables,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2023, pp. 2677–2703.
  • [11] L. Feng, Q. Li, Z. Peng, S. Tan, and B. Zhou, “Trafficgen: Learning to generate diverse and realistic traffic scenarios,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 3567–3575.
  • [12] S. Tan, K. Wong, S. Wang, S. Manivasagam, M. Ren, and R. Urtasun, “Scenegen: Learning to generate realistic traffic scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 892–901.
  • [13] P. Huang, M. Xu, J. Zhu, L. Shi, F. Fang, and D. Zhao, “Curriculum reinforcement learning using optimal transport via gradual domain adaptation,” Advances in Neural Information Processing Systems, vol. 35, pp. 10 656–10 670, 2022.
  • [14] J. Wang, A. Pun, J. Tu, S. Manivasagam, A. Sadat, S. Casas, M. Ren, and R. Urtasun, “Advsim: Generating safety-critical scenarios for self-driving vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9909–9918.
  • [15] N. Hanselmann, K. Renz, K. Chitta, A. Bhattacharyya, and A. Geiger, “King: Generating safety-critical driving scenarios for robust imitation via kinematics gradients,” in European Conference on Computer Vision.   Springer, 2022, pp. 335–352.
  • [16] Z. Zhong, D. Rempe, D. Xu, Y. Chen, S. Veer, T. Che, B. Ray, and M. Pavone, “Guided conditional diffusion for controllable traffic simulation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 3560–3566.
  • [17] S. Tan, B. Ivanovic, X. Weng, M. Pavone, and P. Kraehenbuehl, “Language conditioned traffic generation,” arXiv preprint arXiv:2307.07947, 2023.
  • [18] M. Xu, P. Huang, F. Li, J. Zhu, X. Qi, K. Oguchi, Z. Huang, H. Lam, and D. Zhao, “Scalable safety-critical policy evaluation with accelerated rare event sampling,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 12 919–12 926.
  • [19] W. Ding, B. Chen, M. Xu, and D. Zhao, “Learning to collide: An adaptive safety-critical scenarios generating method,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 2243–2250.
  • [20] W. Ding, H. Lin, B. Li, K. J. Eun, and D. Zhao, “Semantically adversarial driving scenario generation with explicit knowledge integration,” arXiv preprint arXiv:2106.04066, vol. 1, 2021.
  • [21] M. Xu, P. Huang, W. Yu, S. Liu, X. Zhang, Y. Niu, T. Zhang, F. Xia, J. Tan, and D. Zhao, “Creative robot tool use with large language models,” arXiv preprint arXiv:2310.13065, 2023.
  • [22] S. Suo, S. Regalado, S. Casas, and R. Urtasun, “Trafficsim: Learning to simulate realistic multi-agent behaviors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 400–10 409.
  • [23] Q. Li, Z. Peng, L. Feng, C. Duan, W. Mo, B. Zhou et al., “Scenarionet: Open-source platform for large-scale traffic scenario simulation and modeling,” arXiv preprint arXiv:2306.12241, 2023.
  • [24] W. Ding, M. Xu, and D. Zhao, “Cmts: Conditional multiple trajectory synthesizer for generating safety-critical driving scenarios,” in International Conference on Robotics and Automation (ICRA).   IEEE, 2020.
  • [25] W. Ding, W. Wang, and D. Zhao, “Multi-vehicle trajectories generation for vehicle-to-vehicle encounters,” in 2019 IEEE International Conference on Robotics and Automation (ICRA), 2019.
  • [26] C. Xu, W. Ding, W. Lyu, Z. Liu, S. Wang, Y. He, H. Hu, D. Zhao, and B. Li, “Safebench: A benchmarking platform for safety evaluation of autonomous vehicles,” Advances in Neural Information Processing Systems, vol. 35, pp. 25 667–25 682, 2022.
  • [27] W. Ding, B. Chen, B. Li, K. J. Eun, and D. Zhao, “Multimodal safety-critical scenarios generation for decision-making algorithms evaluation,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1551–1558, 2021.
  • [28] W. Ding, H. Lin, B. Li, and D. Zhao, “Causalaf: Causal autoregressive flow for goal-directed safety-critical scenes generation,” arXiv preprint arXiv:2110.13939, 2021.
  • [29] M. Klischat and M. Althoff, “Generating critical test scenarios for automated vehicles with evolutionary algorithms,” in 2019 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2019, pp. 2352–2358.
  • [30] M. Arief, Z. Huang, G. K. S. Kumar, Y. Bai, S. He, W. Ding, H. Lam, and D. Zhao, “Deep probabilistic accelerated evaluation: A robust certifiable rare-event simulation methodology for black-box safety-critical systems,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2021, pp. 595–603.
  • [31] Y. Cao, C. Xiao, A. Anandkumar, D. Xu, and M. Pavone, “Advdo: Realistic adversarial attacks for trajectory prediction,” in European Conference on Computer Vision.   Springer, 2022, pp. 36–52.
  • [32] C. Zhang, J. Tu, L. Zhang, K. Wong, S. Suo, and R. Urtasun, “Learning realistic traffic agents in closed-loop,” in 7th Annual Conference on Robot Learning, 2023.
  • [33] W. Ding, Y. Cao, D. Zhao, C. Xiao, and M. Pavone, “Realgen: Retrieval augmented generation for controllable traffic scenarios,” arXiv preprint arXiv:2312.13303, 2023.
  • [34] W. Ding, H. Lin, B. Li, and D. Zhao, “Generalizing goal-conditioned reinforcement learning with variational causal reasoning,” Advances in Neural Information Processing Systems, vol. 35, pp. 26 532–26 548, 2022.
  • [35] A. Li, S. Chen, L. Sun, N. Zheng, M. Tomizuka, and W. Zhan, “Scegene: Bio-inspired traffic scenario generation for autonomous driving testing,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14 859–14 874, 2021.
  • [36] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
  • [37] Z. Zhong, D. Rempe, Y. Chen, B. Ivanovic, Y. Cao, D. Xu, M. Pavone, and B. Ray, “Language-guided traffic simulation via scene-level diffusion,” arXiv preprint arXiv:2306.06344, 2023.
  • [38] J.-B. Mouret and J. Clune, “Illuminating search spaces by mapping elites,” arXiv preprint arXiv:1504.04909, 2015.
  • [39] B. Tjanaka, M. C. Fontaine, D. H. Lee, Y. Zhang, N. R. Balam, N. Dennler, S. S. Garlanka, N. D. Klapsis, and S. Nikolaidis, “Pyribs: A bare-bones python library for quality diversity optimization,” in Proceedings of the Genetic and Evolutionary Computation Conference, ser. GECCO ’23.   New York, NY, USA: Association for Computing Machinery, 2023, p. 220–229. [Online]. Available: https://doi.org/10.1145/3583131.3590374
  • [40] A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret, “Robots that can adapt like animals,” Nature, vol. 521, no. 7553, pp. 503–507, 2015.
  • [41] V. Bhatt, H. Nemlekar, M. Fontaine, B. Tjanaka, H. Zhang, Y.-C. Hsu, and S. Nikolaidis, “Surrogate assisted generation of human-robot interaction scenarios,” arXiv preprint arXiv:2304.13787, 2023.
  • [42] Y.-S. Tung, M. B. Luebbers, A. Roncone, and B. Hayes, “Workspace optimization techniques to improve prediction of human motion during human-robot collaboration,” arXiv preprint arXiv:2401.12965, 2024.
  • [43] A. Morel, Y. Kunimoto, A. Coninx, and S. Doncieux, “Automatic acquisition of a repertoire of diverse grasping trajectories through behavior shaping and novelty search,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 755–761.
  • [44] J. Huber, F. Hélénon, M. Kappel, E. Chelly, M. Khoramshahi, F. B. Amar, and S. Doncieux, “Speeding up 6-dof grasp sampling with quality-diversity,” arXiv preprint arXiv:2403.06173, 2024.
  • [45] S. Surana, B. Lim, and A. Cully, “Efficient learning of locomotion skills through the discovery of diverse environmental trajectory generator priors,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 12 134–12 141.
  • [46] J. Nordmoen, F. Veenstra, K. O. Ellefsen, and K. Glette, “Map-elites enables powerful stepping stones and diversity for modular robotics,” Frontiers in Robotics and AI, vol. 8, p. 639173, 2021.
  • [47] E. Zardini, D. Zappetti, D. Zambrano, G. Iacca, and D. Floreano, “Seeking quality diversity in evolutionary co-design of morphology and control of soft tensegrity modular robots,” in Genetic and Evolutionary Computation Conference, 2021.
  • [48] P. Liu, Z. Guo, H. Yu, H. Linghu, Y. Li, Y. Hou, H. Ge, and Q. Zhang, “A preliminary study of multi-task MAP-elites with knowledge transfer for robotic arm design,” in 2022 IEEE Congress on Evolutionary Computation (CEC).   IEEE, jul 2022. [Online]. Available: https://doi.org/10.1109%2Fcec55065.2022.9870374
  • [49] M. C. Fontaine, J. Togelius, S. Nikolaidis, and A. K. Hoover, “Covariance matrix adaptation for the rapid illumination of behavior space,” in Proceedings of the 2020 genetic and evolutionary computation conference, 2020, pp. 94–102.
  • [50] H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,” arXiv preprint arXiv:2106.11810, 2021.
  • [51] M. Fontaine and S. Nikolaidis, “Differentiable quality diversity,” Advances in Neural Information Processing Systems, vol. 34, pp. 10 040–10 052, 2021.
  • [52] ——, “A quality diversity approach to automatically generating human-robot interaction scenarios in shared autonomy,” arXiv preprint arXiv:2012.04283, 2020.
  • [53] N. Hansen, “The cma evolution strategy: A tutorial,” arXiv preprint arXiv:1604.00772, 2016.
  • [54] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine learning, vol. 8, pp. 229–256, 1992.
  • [55] Y. Liu, P. Ramachandran, Q. Liu, and J. Peng, “Stein variational policy gradient,” arXiv preprint arXiv:1704.02399, 2017.