Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey
Open access

Testing, Validation, and Verification of Robotic and Autonomous Systems: A Systematic Review

Published: 30 March 2023 Publication History

Abstract

We perform a systematic literature review on testing, validation, and verification of robotic and autonomous systems (RAS). The scope of this review covers peer-reviewed research papers proposing, improving, or evaluating testing techniques, processes, or tools that address the system-level qualities of RAS.
Our survey is performed based on a rigorous methodology structured in three phases. First, we made use of a set of 26 seed papers (selected by domain experts) and the SERP-TEST taxonomy to design our search query and (domain-specific) taxonomy. Second, we conducted a search in three academic search engines and applied our inclusion and exclusion criteria to the results. Respectively, we made use of related work and domain specialists (50 academics and 15 industry experts) to validate and refine the search query. As a result, we encountered 10,735 studies, out of which 195 were included, reviewed, and coded.
Our objective is to answer four research questions, pertaining to (1) the type of models, (2) measures for system performance and testing adequacy, (3) tools and their availability, and (4) evidence of applicability, particularly in industrial contexts. We analyse the results of our coding to identify strengths and gaps in the domain and present recommendations to researchers and practitioners.
Our findings show that variants of temporal logics are most widely used for modelling requirements and properties, while variants of state-machines and transition systems are used widely for modelling system behaviour. Other common models concern epistemic logics for specifying requirements and belief-desire-intention models for specifying system behaviour. Apart from time and epistemics, other aspects captured in models concern probabilities (e.g., for modelling uncertainty) and continuous trajectories (e.g., for modelling vehicle dynamics and kinematics).
Many papers lack any rigorous measure of efficiency, effectiveness, or adequacy for their proposed techniques, processes, or tools. Among those that provide a measure of efficiency, effectiveness, or adequacy, the majority use domain-agnostic generic measures such as number of failures, size of state-space, or verification time were most used. There is a trend in addressing the research gap in this respect by developing domain-specific notions of performance and adequacy. Defining widely accepted rigorous measures of performance and adequacy for each domain is an identified research gap.
In terms of tools, the most widely used tools are well-established model-checkers such as Prism and Uppaal, as well as simulation tools such as Gazebo; Matlab/Simulink is another widely used toolset in this domain.
Overall, there is very limited evidence of industrial applicability in the papers published in this domain. There is even a gap considering consolidated benchmarks for various types of autonomous systems.

1 Introduction

1.1 Motivation

Robotic and Autonomous Systems (RAS) involve a rich integration of several disciplines such as control engineering and robotics, mechanical engineering, electronics, and software engineering. Validation and verification of RAS entails a non-trivial extension of traditional testing techniques to deal with their multi-disciplinary nature. In particular, for researchers and practitioners from the software testing community, extending the existing software testing techniques to RAS is a challenge that has led to a sizeable literature on proposing and evaluating different techniques and processes. This rich literature calls for a secondary study that puts a structure to this landscape and identifies relative strengths and weakness of available results. The present article addresses this gap by performing a structured literature survey of RAS testing.
There are a number of earlier surveys on related topics; we provide an in-depth comparison of related work with our survey in Section 2. However, briefly speaking, some of these surveys have a different or more confined scope, e.g., considering machine learning components [79], formal specification and verification techniques [138] or driving datasets [108], or do not aim to provide a structure overview of the field to answer concrete questions for a given audience [20]. To our knowledge, this is the first systematic secondary study that covers the breadth of results in testing RAS (see the Related Work section for other studies with different foci) and moreover, provides an analysis of such results with the aim of characterising the type of techniques, process and analyse their evidence of applicability (in terms of tools and type of case study).

1.2 Scope and Audience

Our scope covers novel results (including techniques, process, tools, and applications thereof) that deal with testing robotic and autonomous systems. We call such novel results “interventions,” following the tradition in medical secondary studies, as well as recent systematic reviews in testing [3]. In our terminology, an intervention is “an act performed (e.g., use of a technique or a process change) to adapt testing to a specific context, to solve a test issue, to diagnose testing, or to improve testing” [68]. The scope of our survey includes several validation and verification techniques, including physical testing, model-based testing, runtime monitoring, formal verification, and model checking.
Our audience are both researchers and practitioners in software and systems engineering. Hence, we perform our analysis from two perspectives:
(1)
researchers: to identify strengths and gaps in the research landscape of testing RAS, particularly concerning the traditional software testing taxonomies, are there new challenges not covered by software testing taxonomies?; and
(2)
practitioners: identify interventions that have the evidence of applicability given the environment and available resources.
We provide a precise definition of RAS in the remainder of this article to derive rigorous inclusion and exclusion criteria. But in a nutshell, for our interventions to be useful for the intended audience, we confine our scope to those interventions that
(1)
address testing the computer systems integrated in RAS (as opposed to only physical, mechanical, or control parts) in their methodology; this is justified by the fact that our intended audience are researchers and practitioners in software and systems engineering,
(2)
have some evidence of applicability, efficiency, or effectiveness on RAS; this is motivated by our scope (testing, validation, and verification of RAS) as well as our goal to provide evidence of strength (or weakness) for researchers and practitioners, and
(3)
take the system-level validation and verification into account and do not focus on a specific unit or component of such systems (e.g., a specific type of learning or planning algorithms or testing of physical or mechanical parts of such systems); this is motivated by the inherent multi-disciplinary of RAS and the requirement for accommodating it for any system-level testing RAS.
Next, we define a number of research questions that help us structure and analyse the existing interventions for the two groups of audience.

1.3 Research Questions

As specified above, we would like to review and analyse those interventions that are applicable to testing, validation, and verification of RAS; in particular, we have an emphasis on those intervention that take into account the computer systems in RAS and their interactions with their physical environment and human users. In the remainder of this section and throughout the rest of the article, we use the term testing to refer to various testing, validation, and verification techniques.
A structured method of testing, validation, or verification is often steered by models, describing the structure or the behaviour of the system under test. The type of models often determines the type of analysis that can be applied and hence, has a far-reaching effect on the applicability and effectiveness of the technique. However, not all included interventions are model-based (or even related to test cases), as we also consider other forms of verification such as runtime monitoring. Moreover, the metrics of effectiveness, efficiency, and coverage used to evaluate the system under test and the intervention itself are both a major factor in determining the intervention’s applicability and, hence, form a major part of our research questions. Finally, the case studies performed to evaluate the technique are a major source of evidence for applicability. Based on these observations, our research questions are specified below:
(1)
What are the types of models used for testing RAS?
We interpret the word “model” liberally as any information source or domain abstraction that is used to structure or steer the testing process or evaluate the outcome of testing. This helps us understand and decide about the type of abstractions that are commonly used or needed for testing RAS. They help both researchers and practitioners identify the types/aspects of RAS that can be addressed using the current testing interventions and also the type of information that need to be made available for these interventions to be applicable. It also points out to aspects of RAS that are currently not covered by the current interventions. In line with the above specified goals, we analyse two types of models: those that address the system under test or its environment, versus models that describe its quality attributes.
(2)
Which efficiency, effectiveness, and coverage measures were introduced or used to evaluate RAS testing interventions?
Efficiency refers to the amount of time and resource needed for an intervention to achieve its goal. Effectiveness refers to the type and the number of faults recovered by a testing intervention and coverage refers to any measure that is used to decide the adequacy and the stopping criteria for a testing intervention. Answering these two questions also provides researchers and practitioners with the available evidence for the strength and applicability of the existing techniques, process, and tools.
(3)
What are the interventions supported by (publicly available) tools in this domain?
Tool support is a key enabler to the application of testing interventions in practice and their integration with other interventions in research contexts. We analyse the literature regarding this research question by providing information about the tooling available for and needed for each intervention; we call the first class of tools, i.e., those tools that are developed to support a particular intervention, effect tools, and those tools used and needed for the effect tools to function. The second category, called context tools, provides further information about what is needed for a particular intervention to be automated in its context. We also report about the license information, when available, to facilitate decision-making.
(4)
Which interventions have evidence of applicability to large-scale and industrial systems?
We gather evidence from the reviewed interventions in terms of case studies and classify them into small-scale, benchmarks, and industrial case studies.

1.4 Structure of the Article

The remainder of this article is structured as follows: In Section 2, we review related work, with a focus on secondary studies (literature surveys and reviews) on related subject matters. In Section 3, we define the scope of the article and explain the background to this structured review. There, we report on the core set of results we started with as the seed for our search to shape the study. In Section 4, we review the methodology we used for the our systematic review; this includes the description of our search and selection strategy, the development of the taxonomy used for coding the results, our data extraction, and synthesis methods. In this section, we also reflect on the threats to our study. In Section 5, we present the results of our coding and analyse them to answer our research questions. In Section 6, we reflect on our analysis and provide concrete suggestions for our target audience, i.e., both for researchers and practitioners. In Section 7, we conclude the article and present some directions of future research.

2 Related Work

There are a number of literature reviews, surveys, and mapping studies conducted that cover different aspects of robotic and autonomous systems. In what follows, we give an overview of the ones that are most related and have the closest connection to our study (in chronological order).
Cortesi, Ferrara, and Chaki [55] discuss the features of a number of analysis techniques, namely, data-flow analysis, control-flow analysis, model-checking, and abstract interpretation. The survey covers features such as automation, precision, scalability, and soundness for these techniques. The goal for the study is stated as providing robotics software developers hints to help choosing appropriate analysis approaches, depending on the kind of properties of interest and software system. However, the interventions studied in this article are not necessarily applied in the robotics domain already. Furthermore, the work is not a systematic review and does not claim providing any coverage on existing work on analysis techniques applied in its target application domain.
Helle, Schamai, and Strobel [99] as well as Redfield and Seto [181] provide an overview of challenges in and available techniques and results for testing and verification of autonomous systems. Both studies only sample a small subset of available results and techniques and use them to identify the areas requiring future research. Our findings, based on a much larger set, provide a much more refined view about the available interventions and the landscape for future research.
Koopman and Wagner [118] give an overview of challenges in the V model adapted to deal with the problems in the context of autonomous vehicles. The paper identifies five major challenge areas in testing according to the V model for autonomous vehicles, namely, driver out of the loop, complex requirements, non-deterministic algorithms, inductive learning algorithms, and fail operational systems. The paper covers solution approaches that seem promising across these different challenges including phased deployment using successively relaxed operational scenarios, and using a monitor/actuator pair architecture to separate complex automated and autonomous functions from simpler safety functions, and fault injection. Similar to the previous two papers, the work of Koopman and Wagner has a more restrictive scope than the present article; moreover, the above-mentioned work is not a (structured) review of the literature.
Gao and Tan [79] provide an overview of the state-of-the-art in V&V for safety-critical systems that rely on machine learning techniques (based on deep learning) for autonomous driving. In this work, the researchers first extract a set of studies by conducting a search and identify a set of challenges by reviewing these studies. Then, the validity of the identified challenges is checked by setting up an industrial questionnaire to survey. Furthermore, a set of research recommendations is provided for future work in automated driving based on deep learning. The search query used in this study is more limited than ours in scope, because it focuses on testing for automated driving and deep learning, while we cover robotic and autonomous systems in a much broader sense. The articles covered in this study are published before 2017.
Knauss et al. [114] present an empirical study for investigating software-related challenges of testing automated vehicles. In the work, two different kinds of data collection, namely, focus groups (including 11 practitioners from Sweden) and interviews (including 15 practitioners and researchers from a number of countries), are used. The work provides insights about challenges such as virtual testing and simulation, standards and certifications, increased need to test nonfunctional aspects, and automation. This work is not a systematic mapping.
Rao and Frtunikj [180] identify three concrete issues regarding assessment of functional safety of neural networks used in automotive industry to initiate the discussion with industrial peers to find practical solutions. The issues include: dataset completeness, neural network implementation, and transfer learning.
Kang, Yin, and Berger [108] provide a survey of publicly available driving datasets as well as virtual testing for autonomous driving algorithms. A detailed overview of 37 datasets for open-loop testing and 22 virtual testing environments for closed-loop testing have been provided. A remarkable aspect of this survey is the involvement of an industrial domain expert. The scope and results of the paper are significantly different from ours: They focus on autonomous driving algorithms, while we include the whole domain of RAS; they focus on datasets and tools, while we focus on interventions and their effects, as well as their tools.
Beglerovic, Metzner, and Horn [20] provide a brief overview of methodologies used for testing in automated driving. The work provides recommendations about promising methodologies and research areas aimed to reduce the testing effort. The authors mention challenges such as complexity of automated driving functions, variation of scenarios and parameters, scenario selection, and test generation. Furthermore, the work briefly touches upon validation, supporting tools in the validation task, and standardisation. This paper is significantly different in methodology from ours: It is not a mapping study and does not provide any detail about the coverage of existing work.
Luckcuck et al. [138] provide a survey of formal specifications and verification methods and tools used for autonomous robotics systems. The work covers a range of studies from 2007–2018. In their work, a number of challenges for formally modelling and verifying the environments that the robotic systems operate in in addition to the internals of such systems is provided. Their work differs from ours, as it only covers formal specification and verification tools for such systems. Hence, techniques such as (non-exhaustive) testing and simulation are not covered in their work. Also, our work has a different methodological approach in that we pose and answer research questions as the result of our secondary study, while they focus on the literature review itself. We did use the studies reviewed by Luckcuck et al. to validate and refine our search query in the third phase of our research.
Gleischner, Foster, and Woodcock [89] provide an overview of the strengths, weaknesses, opportunities, and threats in the application of integrated Formal Methods to robotic and autonomous systems. Some of their findings, such as the gaps concerning evidence of effectiveness and tool support, reinforce our findings and some, such as the challenges in training, are complementary to those of the present article. We believe some of the complementary findings arise from the general experience and findings about the application of formal methods, which goes broader than the scope of a survey in the domain of robotic and autonomous systems.
Tahir and Alexander [207] perform a systematic literature review on coverage-based validation, verification, and safety assurance techniques for autonomous vehicles. The scope of their survey is much more confined than ours. They do code different coverage criteria as an answer for one of their research questions, which has an overlap with our goal with identifying the coverage criteria. We have used their included papers to validate our search query as a part of our third phase methodology.
Rajabali et al. [178] perform an extensive and systematic literature review on software validation and verification for autonomous vehicles. Their scope is more restricted than the scope of the present study, but some of their research questions (such as identifying gaps in the literature) are common to ours. However, their methodology does not involve a detailed taxonomy as in the present study and, hence, their conclusions are more abstract and at a higher level. We have also used this recent paper to validate the query and the final set of considered papers in the third phase of our research.

3 Background AND Rationale

In this section, we provide an overview of the motivation behind this literature survey and define its domain. Subsequently, we introduce the basic taxonomy that we have extended and adapted for coding the literature. We also review the pilot study that was used to shape our taxonomy (and later validate our search query, presented in the next section).

3.1 Motivation

Based on our study of the existing literature reviews and surveys, we identified the gap for a secondary study that (1) presents a structured review of the existing results on validation and verification of robotic and autonomous systems and (2) targets specific research questions regarding (a) the types of models, (b) measures of efficiency and effectiveness, (c) available tools, and (d) evidence of applicability to large-scale and industrial systems.

3.2 Robotic and Autonomous System

There is a variety of definitions for our domain, RAS; these definitions encompass aspects such as autonomy (including high-level decision-making and planning) and adaptation (including artificial intelligence and machine learning) and interaction with human users and the physical environment (including perception, actuation, and mobility). In our view, the following definition provides a concise synthesis of these aspects:
An autonomous system is an intelligent system that is designed to deal with the physical environment on its own and work for extended periods of time without explicit human intervention. They are built to analyse, learn from, adapt to, and act on the surrounding environment.
This definition is inspired by and merges some complementary aspects in the earlier definitions given by the Royal Academy of Engineering [164] and the National Science Foundation [76]. We emphasise two important aspects of this definition: one is the system-level perspective; hence, modules or units of software and hardware that are not autonomous systems themselves will not be included in our studies; the second important aspect is the interaction with the environment; hence, autonomous systems that work on offline data and do not feature an interaction with their environment are excluded as well.

3.3 Testing and the SERP-Test Taxonomy

In this work, we consider a testing intervention as any structured approach to validate or verify the quality of a robotic and autonomous system. Validation concerns checking the system specification, design, or implementation against user requirements. Verification concerns checking the system specification, design, or implementation against another piece of specification, design, or implementation. In other words, validation checks whether we have built the right system (for its users), while verification checks whether we have built it correctly (with respect to other specifications and artefacts) [173].
Our classification of testing research is based on the SERP-Test taxonomy [68]. This taxonomy provides a very general framework for classifying and communicating software testing research and has been used and adapted for this purpose across different domains [3, 183]. It serves as a useful tool for researchers and practitioners to select a testing process or technique based on the available resources or the expected evidence of applicability, effectiveness, and efficiency. In SERP-Test, testing research is classified in terms of four facets: intervention, effect, scope, and context. Intervention pertains to the test techniques, their adaptation, and adoption in different context. Effect facet is used to identify the improvement or adaptation in a given practise as well as any insights gained through assessment. The scope specified whether the effect has been materialised in planning, design, execution, or analysis of tests. Context, as its name suggests, specifies the environment where the intervention takes place, in terms of people and their knowledge, the system under test, and the required models and other types of information.
In the next section, we report on the methodology of this study; namely, in Section 4.1, we discuss the seed papers that formed the basis of our search; in Section 4.2, we report on the final inclusion and exclusion criteria; and in Section 4.3, we report on the adapted taxonomy. In Section 4.4, we report on the search query and its validation with respect to the seed papers; Finally, in Section 4.5, we detail our strategy to extract data from the set of included papers.

4 Methodology

In this section, we present the methodology used throughout our study that encompasses three phases. In the first phase, a pilot study was conducted in which we gathered a set of seed papers (Section 4.1), developed a set of inclusion/exclusion criteria (Section 4.2), and refined our taxonomy (Section 4.3). In the second phase (Section 4.4.1), we performed the search, applied the exclusion criteria, and coded the selected papers. In the final phase (Section 4.4.2), the search query was validated and refined via an analysis of the secondary studies on the subject and, also, in consultation with domain experts; a new search was performed and additional studies were included for review and coding. Finally, in Section 4.5, we present our strategy for further filtering papers by their content and an overview of the outcomes.
A repository containing artefacts of this study (namely, the seed papers, the result of the searches, and the coding) is publicly available.1

4.1 Seed Papers

The set of seed papers contains 26 manually selected studies gathered in consultation with domain experts: three experts from academia with 32, 23, and 19 years of experience and one expert from industry with 26 years of experience in computer systems testing and verification domains. We reviewed this set as a pilot study with the following objectives:
(1)
gathering keywords for the initial search query,
(2)
sharpening the inclusion and exclusion criteria, and
(3)
evaluating and adapting the SERP-Test taxonomy.

4.2 Selection Strategy

To set the boundaries for the scope of our study, based on our research questions, we defined and used a set of inclusion/exclusion criteria as follows:

4.2.1 Inclusion Criteria.

The criteria considered for inclusion of studies is as follows:
The topic of the study is on Testing RAS (Robotics and Autonomous Systems),
The context must consider the cyber and physical aspects of a system (as opposed to only physical, mechanical, or control parts.), and
Evidence for applicability is provided.
In the scope of our study, Testing is interpreted in a broad sense, which includes formal verification techniques, static and dynamic testing, validation and non-exhaustive techniques.

4.2.2 Exclusion Criteria.

The studies matching the following criteria are excluded:
Not available online,
Not in English,
Short papers,
Not peer-reviewed,
Patents,
Published before 2008 (in the second phase), published before 2014 (in the third phase),
Not addressing robotics and autonomous systems,
No research contributions to testing (including validation or verification),
Only testing units in isolation; not considering the robotics and autonomous systems as a whole. (If the contribution for testing units are not specific to the system considered in the paper and can have applications in the bigger context, then we included the study.),
The study only considers the physical aspects of the system and not software components,
Concerning human-controlled systems, e.g., UAVs and robots that are remotely controlled by a human, and
For papers on the topic of simulation, as there are a large number of studies among the search results that do not have new contributions in the process or technique of testing interventions; we consider excluding such papers unless they provide clear contributions in the context of testing, validation, and verification, have available tool, or provide evidence of applicability in industrial context.

4.3 Taxonomy

To consistently classify the set of included studies to extract the information required for answering the research questions described in Section 1.3, we follow a modified version of the SERP-Test taxonomy (see Section 3.3). We started with the high-level facets proposed in SERP-TEST taxonomy and throughout a number of iterations we defined and re-defined a number of categories based on the information obtained from coding the included studies. The extracted data from each facet has been used to answer the research questions and to identify strengths and gaps (provided to researchers and practitioners) as part of our analysis in Section 6. An overview of the final taxonomy, based on which the studies are classified, is depicted on Figure 1.
Fig. 1.
Fig. 1. Illustration of taxonomy.
As follows, we provide a brief description of our taxonomy:
Context. For context, we consider two main categories, namely, system under test and the technique.
System Under Test. System under test describes the type of systems on which the testing technique is applied. In our study, we consider two main categories of RAS, namely, Robotics and Autonomous Systems. These two categories of systems are selected, as they dominate the case studies and a broad range of systems that are considered in the studies concerning testing RAS.
Technique. This is the second category that is considered under Context, which represents the testing technique that is improved or affected as a part of the contributions of the work to testing RAS.
Models. Different types of models can be used for describing the behaviour of a system under test. We consider this category to extract the information about the variety of models that are used in the work on testing RAS.
Tools and languages. This category consists of details on tools and languages under which the subject systems are described.
Effect. We refine the Effect facet (see Section 3.3) further to four categories as follows:
Metrics. This category encompasses the metrics used as a way of evaluating test adequacy or correctness of the subject, based on performance (i.e., efficiency and effectiveness) or coverage measures.
Performance. This category describes the effect of an intervention on the performance of the testing technique or the subject system. The performance covers a variety of measures concerning safety, quality, and resources observed during testing.
Coverage. This category concerns the measures that indicate how comprehensive the testing technique is once performing in the context of RAS.
Process. This category describes the kind of effects that impact the process of testing technique.
Technique. The technique concerns methods presented as new testing methods or improvements for testing RAS.
Tooling In this category, we extract information about the type of tools that have been used throughout each work. We further classify the tools according to their availability: (1) open source, which are tools for which the source artefacts are available, (2) publicly available, which are tools that are accessible to be used but the source code has not been provided and (3) private, which are tools that have not been made available for download or purchase.
Scope. This facet in SERP-Test taxonomy is further refined to two main categories as follows:
Model testing. This category represents techniques that use a model of the system for testing. We define two sub-categories for such techniques:
Simulation. This category comprises different types of simulation techniques used for testing RAS.
Formal verification. This category describes formal verification techniques that use a model of the system to rigorously verify the behaviour.
System testing. This category describes techniques that are applied on actual implementation artifacts of systems.
Static testing. This category describes techniques that perform testing of system without code execution.
Dynamic testing. This category describes techniques that check the functional behaviour by executing the implemented code for the system.
Evaluation. We define the case study category, which has three main subcategories for this facet in SERP-Test taxonomy (see Section 3.3)
Case Study. This category specifies the type of systems that has been used in evaluations of the selected papers. We categorise the case studies into three subcategories, namely, small scale, benchmark, and industrial.
Small. We consider examples that are developed solely for the purpose of evaluating the method in a specific study and are not applicable for evaluating other similar intervention (due to lack of available details, lack of genericity, or insufficient scale/number of subject systems) as small scale.
Benchmark. We consider a case study as benchmark if it represents a set of systems with sufficient level of details such that they are/can be used as a point of reference in the evaluations performed in the context of testing autonomous systems.
Industrial. We categorise a case study as industrial if the subject system is of industrial scale and the evaluation has been performed in industrial context.

4.4 Search Strategy

A total of four searches have been conducted. Following after the initial search, three additional searches were conducted to account for our own internal validation and, also, external validation from domain experts. In addition to Google Scholar, two digital libraries, namely, ACM and IEEE, that broadly cover publications with topics in computer science and engineering fields, have been selected as search venue.

4.4.1 Initial Query.

From the seed papers, an initial set of keywords was extracted to form a search query; additional terms with close meanings and relation to the initial keywords were used to broaden the search. Our query is a conjunction of two main sub-queries: one that comprises terms relevant to our application domain, robotic and autonomous systems, and the other contains the terms related to testing and verification. The initial query was as follows:
(“Robots” OR “Robotics” OR “Deep learning” OR “Machine Learning” OR “Artificial Intelligence” OR “Robot Simulator” OR “Autonomous Vehicle” OR “Autonomous Vehicles” OR “Autonomous Cars” OR “Image Classification Systems” OR “Neural Networks” OR “Unmanned Vehicles” OR “Unmanned Aerial Vehicles” OR “UAV” OR “Connected and Autonomous Vehicle” OR “CAV” OR “Automated Functions” OR “Drive Assist” OR “Multi-Agent Systems” OR “Autonomous Agents”)
AND
(“Testing” OR “Validation” OR “Verification” OR “Safety Case Analysis” OR “Runtime Monitoring” OR “Robustness” OR “Simulation” OR “Coverage” OR “Metaheuristics” OR “Search-Based” OR “Combinatorial” OR “SMT Solving” OR “SAT Solving” OR “Constraint Solving” OR “Model Checking”)
For this first search, we limited the scope of our search to papers published between 2008 and 2018. Its outcome was a set of 3,030 studies.

4.4.2 Validated Query.

During our validation process, we made use of the seed papers, secondary studies (by checking papers that were referenced among included papers but were not an outcome of the search, i.e., snowballing technique) and domain specialists. We approached 50 academics and 15 industry experts in the domains of testing and verification to validate the outcome of the above-given search. They provide expertise in several areas, including verification and validation (31 experts with a median of 18 years of experience), artificial intelligence (8 experts with a median of 12 years of experience), human factors (5 experts with a median of 11 years of experience), and robotics and control systems (9 experts with a median of 14 years of experience). Of that group, we received detailed comments from 8 experts—7 academics and 1 from industry with an average 18.25 and median 26 years of experience in the field. This resulted in three revisions of our search query.
In the first revision, we included additional keywords (“Robot,” “Robotic,” “Swarm,” “Swarms,” “UAVs,” “Automated Driving,” “ADAS,” “Verifying,” “Verifiably,” “Assurance,” and “Assuring”), removed keywords that did not result in coded papers (“Machine Learning,” “Deep Learning,” “Artificial Intelligence,” “Image Classification System,” “Neural Networks,” “Robustness,” “Coverage,” and “Combinatorial”), and swapped terms for more generic ones (“Autonomous Vehicles” and “Autonomous Cars” were swapped for “Autonomous”). Furthermore, we observed that from years 2008 to 2014, only a handful of papers were included; this led us to further focus the search to papers published between 2014 and 2018.
In the second revision, we added the terms “Driveless” and “Self-driving.” Finally, in the third revision, to increase the relevancy of our results, we also included papers from 2019. The consolidated search query is as follows:
(“Robots” OR “Robot” OR “Robotics” OR “Robotic” OR “Swarm” OR “Swarms” OR “Autonomous” OR “Unmanned” OR “UAV” OR “UAVs” OR “CAV” OR “Automated Functions” OR “Automated Driving” OR “Drive Assist” OR “Multi-Agent Systems” OR “Multi-Agent System” OR “Driverless” OR “Self-Driving” OR “ADAS”)
AND
(“Testing” OR “Validation” OR “Verification” OR “Verifying” OR “Verifiably” OR “Assurance” OR “Assuring” OR “Safety Case Analysis” OR “Runtime Monitoring” OR “Metaheuristics” OR “Simulation” OR “SMT Solving” OR “SAT Solving” OR “Constraint Solving” OR “Model Checking” OR “Search-Based”)
The validation process resulted in a total of 7,679 additional and unique papers (i.e., the duplicates from the first search were automatically excluded).

4.5 Overview of the Results

As discussed in Section 4.4, 3,030 papers were obtained as a result of the initial query. As a result of the validation process, we obtained a further 7,679 papers. This leads to a total of 10,709 search results. Our data extraction methodology consisted as follows:
First, we went through the results and filtered papers based on their title; we obtained a total of 1,247 potentially relevant papers. Second, the remaining studies were reviewed by abstract and we applied the exclusion criteria (see Section 4.2), which led to a final set of 195 studies. Third, this final set was coded according to our taxonomy and reviewed in detail as a part of this survey. Figure 2 shows a summary of the number of published articles clustered by year of release. We notice a steady yearly increase of studies included in our review.
Fig. 2.
Fig. 2. Relevant and included papers by year.

5 Results

In this section, we present the results of coding the literature in our taxonomy. We structure our results in terms of the four research questions. Regarding RQ1, we present the results concerning the different property specification languages and modelling languages and frameworks used for testing TAS. Regarding RQ2, we review the metrics used to measure the effectiveness, efficiency, and adequacy of testing interventions as well as the quality of systems under test. Regarding RQ3, we code the tools used to implement different interventions as well as any tools implementing the interventions themselves. Regarding RQ4, we present the evidence provided for applicability of the interventions in terms of the case studies and benchmarks used to evaluate the interventions.

5.1 RQ1: Models

In this section, we review the type of models and formalisms that are used for describing the behaviour of robotics and autonomous systems and their properties in testing interventions. Tables 1 and 2 show an overview of results of coding for models used in the studies included in this survey. We classify models according to their semantics (i.e., formal or informal), the domain in which they are employed (i.e., agnostic or domain-specific), and type (i.e., qualitative or quantitative).
Table 1.
Table 1. Models for System Properties
Table 2.
Table 2. Models for System Behaviour or Structure
We consider a model to be quantitative if it can represent measurable quantities such as probabilities or real-valued entities. Otherwise, the model is considered qualitative. This classification applies regardless of whether the results of the evaluation or the testing technique applied on the model is qualitative or quantitative.

5.1.1 Modelling Properties.

Table 1 presents the models that have been used to represent properties and the studies that employ them. Among all studies included in our survey, less than one-third use a model or logic to describe the properties of the subject systems. For this set of studies all models are classified as formal. Among those, we notice that over two-thirds employ logics to describe qualitative properties of systems [8, 9, 17, 19, 22, 23, 23, 41, 41, 64, 64, 69, 71, 75, 77, 81, 103, 107, 110, 119, 120, 121, 122, 134, 136, 140, 158, 170, 175, 197, 203, 216, 220, 221, 222]. Linear temporal logic, first-order logic, and epistemic logic are examples of such logics that have been used in this set of studies. The remaining studies employ logics that can describe quantitative properties, e.g., describing stochastic or temporal aspects of systems [7, 8, 16, 37, 63, 87, 94, 110, 135, 137, 165, 171, 201, 234, 235]. We notice how there is a lack of languages that cater for specific domains; all property languages found in our survey have been domain-agnostic.
A review of the results presented in Table 1 shows there is a limited number of studies that consider analysis of properties of systems formulated using formal logics. Furthermore, quantitative properties are considerably less represented in the selected studies. Properties to verify stochastic, continuous, and temporal aspects of the systems should play an important role when testing complex and real-time systems, such as RAS. This gap emphasises the need for quantitative logics that are tailored for the domain.

5.1.2 Modelling System Behaviour or Structure.

In Table 2, an overview of models used for describing the behaviour or structure of robotics and autonomous systems is provided. Close to half of all of the included studies in this survey employ system models in their testing strategy; mathematical and rigorously defined models, i.e., formal models, are used in most of such interventions.
For instance, Petri nets and a variety of their extensions [10, 19, 75, 191, 230], labelled transition systems and some of their extended versions [12, 37, 94, 137, 189], finite state machines and their extensions [93, 140, 140], and Markov chains [18, 157, 171, 199, 234, 235] are examples of such models. One observation is that, among studies that use informal description of systems, models that are used in Gazebo and on ROS are more commonly used [13, 14, 42, 42, 51, 102, 117, 127].
Some studies employ a combination of models throughout their testing intervention; in particular, for some higher-level models, lower-level models can be used to specify their semantics [47, 156].
Of the studies that consider a behavioural model of their subject systems, around one-third utilise qualitative models. Most of such models are employed in formal verification strategies, where correctness is evaluated via mathematical proofs or model checking. The remaining studies use models that describe different quantitative aspects of systems such as temporal and stochastic behaviour, e.g., using variations of Petri nets (e.g., stochastic and coloured) [10, 19, 75, 132, 190, 191, 230], probabilistic timed automata [12, 137], and Markov chains [18, 157, 171, 199, 234, 235]; system dynamics, using differential equations [6, 54, 70, 129, 139, 148, 165, 202], hybrid automata and their extensions [39, 40, 82, 225, 226, 227], functional mockup units [1] and various informal simulation models for dynamical systems [13, 14, 42, 51, 102, 117, 127, 172, 200, 232].
Compared to studies before 2019, we notice that there has been an increase in the use of stochastic models (from 4 \(\%\) to \(14\%\) ). However, this number is still relatively small given the innate probabilistic aspects observed in RAS; hence, this might indicate the need for further stochastic models that are tailored for the domain. Furthermore, we observe a prevalence of qualitative models, despite the importance of quantitative aspects in the behaviour of RAS.

5.2 RQ2: Effect

In this section, we review two different types of measures: The first type of measures coded and reviewed in this section are those measures used for evaluating efficiency, effectiveness, and coverage of the various testing interventions. The second types of measures are the measures of quality used in testing the subject system; by reusing the terminology, we classify them under efficiency (i.e., concerning timing and resources) and effectiveness (i.e., concerning safety and quality) of the subject system.

5.2.1 Measures for Interventions.

Table 3 provides an overview of our coding of these measures, classified into efficiency (testing time or resources), effectiveness (testing quality), and coverage (testing adequacy). It is remarkable that about one-third of the papers included for this survey used a measure of efficiency, effectiveness, or coverage to evaluate their results. This shows a significant gap in using well-defined measures to evaluate and compare various interventions.
Table 3.
MeasureReferences
EffectivenessAccuracy of the image recognition (failure rates)[199]
Hypervolume in fixed time (search-space coverage in time)[25]
Feature interaction failure[2]
Distance-based surprise adequacy[112]
Number and probability of faulty scenarios generated[155]
Reachability[7, 18]
Number of test cases[113, 208]
Number of failures[175]
Number of counter-examples[205]
Accuracy of the simulation[198]
EfficiencyPrecision[159]
Generational distance in time (distance to Pareto-optimal solutions in time)[25]
Testing (est. gen. and exc., model checking) time[72, 77, 91, 102, 110, 121, 136, 141, 142, 147, 165, 167, 168, 182, 189, 205]
Test case generation time[13]
Test execution and simulation time[16, 32, 67, 70, 87, 117, 160, 171, 192]
Reduced test case execution time[27]
Testing cost (€/ km)[35]
State-space size[5, 10, 15, 72, 77, 110, 122, 135, 182, 189, 201, 221, 222, 234]
Search time[53]
CoverageHypervolume[25]
Structural coverage metrics (state, code, function, transition, path coverage)[10, 13, 14, 58, 93, 132, 190, 191]
Feature interaction (e.g., pairwise and n-wise coverage)[2, 33]
Neuron coverage[211]
Surprise adequacy coverage[112]
Situation (graph) coverage[143]
Requirement[197, 209]
Diversity[159]
Table 3. Classification of Measures Considered in Testing Interventions into Efficiency (Testing Time or Resources), Effectiveness (Testing Quality), and Coverage (Testing Adequacy)
It is also noteworthy that the interventions were measured against a vastly different range of measures. Apart from some very basic notions of efficiency (testing time or state-space size) [13, 27, 53, 102, 110, 121, 147, 165, 171, 189, 192] and coverage (such as state and transition coverage) [10, 13, 14, 58, 93, 132, 191], most other notions are only used for a single intervention. This emphasises the need for coming up with domain-specific and more sophisticated notions of efficiency, effectiveness, and coverage that are used for benchmarking and comparing various interventions. Some exceptions that concern domain-specific measures are hypervolume (as a domain-specific measure for the searched space) and generational distance (as a measure of distance from optimal solutions) [25], cost of testing for autonomous vehicles in Euros per kilometre [35], feature interaction coverage [2, 33], situation coverage [143], and neuron- [211] and surprise adequacy [112] coverage.

5.2.2 Measures for Subject Systems.

In this section, we review the measures of quality for the system under test that are used in various interventions, presented in Table 4. Unlike the previous section, there is more prevalence of domain-specific measures; two commonly used measures are spatial distance from the intended trajectory (and variants thereof) [30, 37, 126, 127], collisions and obstacle avoidance [19, 25, 63, 130, 137, 140, 163, 213]. The remaining measures are sparsely used across many different interventions.
Table 4.
MeasureReferences
EffectivenessProbability of time to collision[128]
Performance and safety properties[70, 176, 220]
Safety for human operators[127]
Satisfied performance properties w.r.t. number of robots[12]
Number of failures[176]
Requirements satisfaction[37, 214]
Spatial deviation of intended behaviour[30, 32, 37, 50, 78, 126, 127]
Endurance distance and stairs traversal of robots[105]
Accuracy of the image recognition[199]
Collisions & obstacle avoidance[11, 16, 19, 25, 49, 54, 63, 130, 137, 140, 163, 213, 238]
Stability[213]
Search depth[71]
Throughput[12]
Schedulability[73]
Positive and supportive interactions towards humans[150, 158]
Anthropomorphism measure[188]
Number of hazards and risk reduction measures[216]
Probability of mission success and failure[15, 141, 142, 158, 168, 226, 227, 234, 235]
Formal assertions (deadlock freedom, liveness)[7, 226, 227]
Criticality (complexity of scenario and dynamics)[57]
Vehicle performance (acceleration, speed, position)[75, 87, 127, 208]
Regret (Difference between rewards earned and achievable rewards)[153]
Severity of failure[209]
Probability of rare events[166]
EfficiencyNumber of collisions over time[163]
Time to collision[113]
Resource utilisation (e.g., CPU)[71, 172]
Network usage[172]
Fuel consumption[70, 137]
Constraint violation rate[131]
Device utilisation[12]
Response time[12]
Training time[32]
Latency[160]
Idle time[6]
Task completion time[83, 158]
Time for hazard identification and risk reduction[216]
Median miles to next disengagement[236]
Battery life[234]
Table 4. Classification of Measures of Quality for the Subject Systems Used in Testing Interventions

5.3 RQ3: Tooling

We gather and describe the tools that have been employed and introduced among included studies. We categorise tools as context and effect tools; a context tool is one that has been employed by the intervention but it is not a byproduct of its respective work. Effect tools, however, are the tools that have been developed by the academic community in our list of selected papers.

5.3.1 Context Tools.

As shown in Table 5, tools for simulation are among the most utilised; their usefulness comes from a less costly method of visualising whether the design and process are satisfactory. The middleware ROS [177] combined with the 3D simulator Gazebo [115] form the most popular tool for robotics simulation. Furthermore, Simulink [65], a graphical extension of MATLAB [151], is the most used tool for modelling and simulation of dynamic systems.
Table 5.
Table 5. Tools Used in the Context of Testing Interventions for RAS and Description of Tools
In the context of autonomous vehicles, traffic simulators such as SUMO [21] and SYNCHRO [104] have also been employed by included interventions, along with vehicle simulators such as CarMaker [44] and Autoware framework [109].
Moreover, tools for formal verification are also extensively used, with model checkers being the most prominent type. The statistical model checker, Prism [123], provides modelling and analysis of systems of stochastic nature modelled in Markov chains or probabilistic automata. As for qualitative models, UPPAAL [125] offers formal verification for timed automata models that can be, however, extended to employ data types.

5.3.2 Effect Tools.

A total of 37 tools, publicly available or otherwise, have been introduced by the academic community as an effect of their intervention. Seven of them were not accessible at the time of writing this survey and were classified as private, including SSIM [117], which is a tool for simulating flight software employed in Mars Rover projects. The remaining tools, a total of 30, are available for the general public; 27 of those also have the source artefacts made public and have been classified as open-source. In Table 6, we also included the specific licence, if any, that the tools are under. We note how the source code of some tools were made available without a licence being specified. In this case, certain repositories, such as GitHub, consider that default (US) copyright laws apply.
Table 6.
ReferencesNameDescriptionAvailability
[211]DeepTestTesting of DNN-driven autonomous carsOpen-source (GPLv3)
[165]APEXFormal verification of autonomous vehicle trajectory planningPrivate
[75]Translation toolTranslation tool from GenoM to FiacrePrivate
[232]RoadviewTraffic scene simulator for Autonomous VehiclesPrivate
[46, 47, 48, 156]RoboToolFormal verification and simulation of robotsPublic (No licence found)
[139]MAV3DSimSimulation platform for UAV controllersPublic (No license found)
[4]Florida Poly AV Verification Framework (FLPolyVF)Verification of the decision-making of autonomous vehiclesOpen-source (MIT licence)
[117]Simulator in JuliaRobots simulationPublic (No license found)
[52]StonefishSimulation tool for marine robotsOpen-source (GPLv3)
[67]GzUAVFramework to run multiple-UAV simulations in GazeboPublic (No license found)
[54]MoveSuite of tools to test autonomous vehiclesOpen-source (GPLv3)
[215]SSIMSimulation of flight softwarePrivate
[6]IMPROVTool for self-verification of robotsPublic (No license found)
[16]VerifCarFramework for validation of decision policies of communicating autonomous vehiclesPublic (No license found)
[18]MCpMCStatistical model checking of pMCOpen-source
[160]Asynchronous Multi-Body FrameworkSimulation of multi-body systemsPublic (No license found)
[53]RobTestTool for stress testing of Single-arm robotsPrivate
[78]AsFaultTest case generation for self-driving carsPublic (No license found)
[233]CyberEarthSimulation of robots and cyber-physical systemsPublic
[37]ArgosMulti-physics robot simulatorOpen-source (MIT licence)
[63]DronaProgramming framework for robotic systemsOpen-source
[102]ROSRVRuntime verification framework for ROSPublic (No license found)
[80]Hybrid Simulation3D simulation toolPublic (No license found)
[91]SpotPrediction of traffic participantsOpen-source (GPLv3)
[100]FROST*Modelling and simulation of dynamical systemsOpen-source (BSD 3-Clause)
[135]PSV-CAProbabilistic swarms verifierOpen-source
[175]RoVerModel CheckerOpen-source (BSD 3-Clause)
[98]FormalModelling and symbolic execution of CPSPrivate
[148]UUVGazebo extension for underwater scenariosOpen-source (Apache-2.0)
[184]V-REPRobots simulatorOpen-source (Commercial or GPLv3)
[212]MARSSimulation environment for marine swarm roboticsOpen-source (BSD 3-Clause)
[77]CrutonTranslation from robotics DSL into NuSMVOpen-source (GPLv3)
[159]Range Adversarial Planning Tool (RAPT)Test scenarios generationPublic (No license found)
[195]PegasusAutonomous vehicles simulationPrivate
[198]AirSimDrone simulation environmentPublic (Commercial licence)
[224]CosinaSimulation of real-time robotics systemsPublic (No license found)
[136]MCMASMulti-agent systems model checkerPublic (No licence found)
Table 6. Tools Introduced by Studies Included in This Survey for Testing RAS
Analogously to context tools, we notice a focus on development of tools for simulation and model checking. Tools for testing vehicles are among such tools, including in road [4, 54, 78, 165, 232], aerial [67, 139], and maritime [52] environments. As for robots, RoboTool [48] and Improv [6] offer formal verification alternatives for testing robots, while ROSRV [102] provides a ROS extension for verification at runtime.

5.4 RQ4: Applicability

Table 7 provides an overview of the case studies conducted among included papers. We classify them as small, benchmark, and industrial. Case studies designed specifically to evaluate a particular intervention, which lack sufficient details or generality to be employed for a general class of interventions, were classified as small. However, those cases studies that are sufficiently general and contain details to evaluate a range of interventions, provided that they are not used in an industrial context, are categorised as benchmark. Industrial case studies are those real-world (and hence, typically detailed and complex) cases conducted in a industrial setting.
Table 7.
Case studiesReferences
SmallPedestrian detection[81]
Humanised robots[230]
UAV[17, 18, 30, 46, 56, 60, 93, 133, 233]
Cleaner agent[163]
Self-driving vehicle[147]
Sensor system[35]
Software functions[217]
Family of surgical robots[149]
Path planning and decision-making[51]
Surveillance drone[63]
Lane-changing scenarios[80, 165]
Small robot[62, 64, 218]
Simple controllers[19]
Unmanned Surface Vehicles[137]
USAR robots[10]
Cooperative forklifts[132]
Agricultural robot[189]
Cruise control[110, 130]
Traffic environment[131]
Multi-agent manufacturing controller[228]
AR.Drone[42]
Cooperative UAVs[103]
(Industrial scale) transport robot[12]
Platoon[107]
Robot swarm[7, 7, 47, 119]
iCub robot[171]
Collision avoidance scenarios[4, 49, 128]
Trained gate controller[122]
Autonomous vehicles scenarios[129, 155, 226, 227, 231]
Footbot[48]
Path following autonomous vehicle[11]
Autonomous parking[50]
Car following[50]
Single arm robot[53]
Ultimatum game[188]
LEGO EV3 robot[204]
Simple robot with LiDAR[205]
Border control system[9]
Military overwatch missions[141]
“AMiResot” robot platform[146]
Service robot[167]
Re-configurable autonomous trolley[190]
Surgical robot[39, 40]
Cruise control agent[72]
Paint spray robot[82]
Group of robots[86]
Communicating robots[158]
Search mission[168]
IndustrialADAS System[25]
Self-driving system[2]
Emergency response robot[105]
Test drive in a test track[70]
Mars Rover[215]
Automated braking system[1]
Lateral State Manager[197]
ADAS scenarios[87]
Automated Emergency Break[113, 208]
NASA benchmark and user case studies[153]
RexROV and Desistek SAGA mini-ROV[148]
Cartesian impedance Control System in torque mode[214]
Care-O-bot[77]
Farming[185]
Quadrotor with Pixhawk controller[198]
Adaptive cruise control[237]
Autonomous CoPilot agent[31]
BenchmarkAutonomous off-road robot RAVON[176]
Swarm of robots[37]
Two-wheel differential drive robots[127]
UAV/Land vehicle cooperation[33]
Smores[213]
Udacity[211]
BERT 2[13, 14]
MIT and NIRA datasets[179]
Traffic sign database[199]
Benchmark[172]
Carina I[58]
Kobuki robot[94, 192]
LEGO EV3 robot[140]
RMP400 Robot MANA[75]
Landshark[102]
Alice autonomous vehicle[225]
Parallel delta robot[36]
Jack ROV[126]
UAV[161]
Quadcopter controller[74]
Videos of pedestrians and vehicles[32, 169]
Traffic wave observations[54]
Leader and follower UAVs[67]
ROBNAV mobile robot[73]
Udacity, MNIST, and CIFAR-10 datasets[112]
ATLAS robot[117]
Human-robot interactions[6, 150]
BenchmarkDaVinci research kits[160]
Turtlebot 2[202]
ZalaZone Smart City Zone[206]
Flexible Manufacturing System (FMS)[216]
Drone with Pixhawk flight controller[219]
WAYMO public road testing dataset[236]
Unmanned underwater vehicle (UUV)[235]
Windfarm drone[234]
Traffic Scenarios[91]
ATLAS and DRC-HUBO robots[100]
NAO robot[175]
NASA’s Unmanned Ground Vehicle[98]
Hanse UAV[212]
iRobot vaccum cleaner[15]
KUKA LWR4+ and the Universal Robots UR5[34]
Underwater vehicle[159]
Chemical detector[182]
COUR-1 robot[220]
Care-O-bot[221, 222]
COMAN[224]
CoCar parking[238]
Table 7. Classification of Case Studies Considered in Testing Interventions as Small, Industrial, and Benchmark
Our observation identifies a significant gap in industrial evaluation of interventions; only 20 interventions [1, 2, 25, 31, 70, 77, 87, 105, 113, 148, 153, 185, 195, 197, 197, 198, 208, 214, 215, 237] have been evaluated in a industrial context. Understandably, the majority of cases studies have been fully conducted in academic settings. Of those, approximately half made use of small-scale models, which are often not representative of real systems. The other half employed their proposed interventions on large-scale subjects and datasets, including physical systems.

6 Suggestions AND Recommendations to Study Audience

In this section, we analyse the results of the previous sections to identify relative strengths and weaknesses regarding our research questions and for our two target audience groups: researchers and practitioners. We conclude this section by drawing recommendations from our analysis both for researchers and practitioners.

6.1 Analysis

6.1.1 Domain.

Table 8 provides a concise summary of the domains covered by the reviewed interventions. A bulk of reviewed interventions do not pertain to any specific sub-domain of RAS. This indicates a clear gap for subdomain-specific research that considers the characteristics of each of these subdomains and takes them into account in their testing interventions. Most importantly, the subdomain of testing marine and submarine RAS as well as space RAS is under-explored (the only included intervention regarding marine and submarine robots [52, 148, 174, 212] and regarding space robots [153, 215] are not represented in the table for the sake of brevity). We note that there is a recently funded European project REMARO to fill in this substantial gap.2
Table 8.
Table 8. Testing, Validation, and Verification Interventions for Specific Subdomains of RAS
Below, we analyse the results gathered in Table 8 on a row-by-row basis:
Qualitative
Despite the intrinsic quantitative nature of RAS, qualitative models also play an important role in the verification of such systems, particularly when they are abstracted away to be amenable to rigorous and exhaustive techniques in formal verification. In the case of vehicles, qualitative models abstract away from physical dynamics and rather focus on observable behaviour that can be modelled as discrete events. Overall, we noticed a gap in the qualitative analysis that focuses on aerial vehicles and mobile robots (excluding road vehicles); this is likely due to the challenge of modelling movement without using continuous dynamics.
Road
In the domain of autonomous cars, qualitative models have been used to reason about Human-Machine Interactions [162, 231] and high-level decision-making, particularly regarding ethical concerns [60] and safety [81, 199]. Most quantitative interventions propose use of formal models in their methodology [58, 60, 71, 72, 147, 162, 197, 205, 229, 231]. For instance, Yun et al. [231] propose a strategy that formalises Human-Machine interaction into SysML language to help steer the testing process. In the same vein, Naujoks et al. [162] provide a DSL using a taxonomy of use cases to cover transitions and modes in Human-Machine Interaction interfaces used in the verification process. Sun et al. [205] employ Satisfiability-Modulo-Convex encoding to build finite state abstractions of the learning component and formally verify it. Dennis et al. [60] focus on formalising and verifying ethical concerns in BDI agents. More informal approaches include the use of 3D simulation by describing scenarios in UML notation modelling [208] and the use of graphical notations for safety assurance analysis [81, 199]. A mix of formal and informal models is employed by Heitmeyer et al. [98], where they provide two new tools to be included into their toolset (FORMAL [97]). The first tool synthesises formal models from scenarios written in Event Sequence Charts, and the second tool incorporates a 3D simulation tool (eBotworks [92]) into their toolset.
Aerial
In the aerial domain, there are only a handful of interventions that use qualitative models for testing of aerial vehicles, mostly regarding safety and security concerns. For instance, linear temporal logic is used more than once to formalise safety assurance cases [17, 41]. Hagerman et al. [93] make use of finite state machines to extract security test suites, and Bhattacharyya et al. [31] focus on formally verifying the boundaries beyond which the agent are designed to operate by translating models from a cognitive architecture (Soar [124]) into UPPAAL.
Mobile
Only two studies have been found in this category. Andrews et al. [10] model autonomous systems and their environment using Petri nets to generate test cases and apply their technique to a case study in the human-robot interaction domain. Furthermore, in the context of software product lines, Mansoor et al. [149] conduct a case study on a family of surgical robots by employing formal analysis, feature modelling, and testing; they discuss the key challenges and lessons learned from the case study.
Generic
Most of the included interventions in this category concerned abstract representations of multi-agent autonomous systems and provided efficient algorithms for parametric (formal) verification or state-space reduction techniques [8, 22, 23, 24, 119, 120, 121, 122, 134, 136, 167, 187]. Similar to the previous item, most of the interventions used Linear Temporal Logic or variants thereof to model safety properties [88, 222]. Formal modelling and verification of human-machine interaction is also a common theme in this category [69, 221, 222].
Quantitative
We see that quantities such as time (representing the real-time behaviour), probabilities (representing abstractions in communication networks or choices made by human actors), and physical dynamics (such as velocity and acceleration) are used for testing in various domains. Most of the developed techniques, such as compositional verification techniques or evaluation of intervention, do not pertain to domain-specific instances of these quantities and consider general multi-agent and robotic systems. Below, we review domain-specific interventions as well interventions that are developed for the general context of RAS, structured by the columns in Table 8.
Road
In this domain, we see a strength in integrating code-level abstractions (e.g., for individual components or functions) with system-level specifications (for vehicles and fleets of vehicles and using); such integrations are then tested using simulation and formal verification frameworks. The quantitative models used for such testing intervention often pertain to vehicle dynamics and also probabilities arising from communication frameworks. AbdElSalam et al. [1] present a framework for verification of ADAS and autonomous vehicles that uses SystemC TLM models for virtual ECUs. Transaction-Level Models provide a high-level abstraction of the SystemC components that are used in virtual ECUs. These models are then integrated with the vehicle and traffic models for simulation. Parametric modelling of CAVs as a network of timed automata is used by Arcile et al. [16]. In this work, VerifCar tool is applied to asses the impact of communication delays on the decision algorithms of CAVs and to check the robustness and efficiency of such algorithms. Similarly, variations of extensions of timed-automata, probabilistic timed automata, and stochastic timed automata are used [12, 137, 226] for modelling the behaviour of autonomous vehicles to verify properties of different decision-making and collision avoidance algorithms. Barbot et al. [19] use statistical model checking to verify an autonomous vehicle controller where the controller is specified in C++. A set of safety properties specified in HASL, a quantitative variant of linear temporal logic, are verified for a controller. Betts et al. [30] compare the effectiveness of two search-based testing methods, genetic algorithms and surrogate-based optimisation, for test case generation for UAV flight control software. There are several works in this category that provide and use simulation platforms [51, 51, 70, 87, 128, 131, 133, 172, 206].
Aerial
In this domain timing information Human-machine interaction is formalised (through a formalisation of a cognitive architecture) in terms of networks of timed automata, and UPPAAL is subsequently used for their verification. [18, 33, 42, 63, 103, 139, 161, 201, 234]
Mobile
There are a few studies in this domain that consider quantitative models. Among these studies, the majority employ formal models [12, 37, 137]. Statistical model checking is one of the techniques that is used [12] to verify performance of transport robots based on behavioural models, stochastic timed automata, using UPPAAL SMC. Furthermore, model checking of Markov models is used [37] to verify PCTL properties of swarm robotics behaviour in the design phase. The models then are used as a blueprint for implementation and simulation. Probabilistic model checking of unmanned surface vehicles is another technique used [137]. PRISM model checker verifies PCTL properties of USVs on probabilistic timed automata as behavioural model. Other work in this category [176] use a designed DSL (graph-based models) to describe the system behaviour that is used as test model for generating test cases.
Generic
As a general observation a considerable portion of all interventions (23% of papers) in this category have been an out-of-the-box application of model checking tools (mostly using PRISM [123], in some cases UPPAAL [125] and FDR [85]) to specify small-scale robotic case studies; another prevalent category (30% of papers) concerns theoretical papers on various logics and model-checking algorithms for multi-agent robotic systems. Notable exceptions of this general theme include languages and toolsets for rigorous simulation [47, 48, 156, 182] using formal verification as part of the design of robotic interaction protocols [64, 228], using formal verification to analyse human-robot interaction [69, 158, 175, 216, 221, 222], and generating test cases from formal models [14, 46, 132, 163, 192]. Another interesting intervention concerned comparison of different hybrid-systems solvers [39, 40] for formal verification of robotic applications. Also, the theorem prover Isabelle/HOL has been used to formalise safety assurance claims [88]. Also, model checking has been used to train policies in reinforcement learning [171]. Compositional techniques has been used to reduce the complexity of the model checking problem [170, 189]. Another noteworthy attempt is in formalising and verifying ethical concerns [60, 62].
Formal
In summary, there is a relative strength in theoretical foundations of testing, validation, and verification, comprising various logical and specification formalisms and small-scale proof of concept exercise in model checking abstractions of RAS. There seems to be a recent trend, identified below, in analysing human-machine interactions. There is a relative weakness in non-exhaustive testing, validation, and verification techniques and studying and improving their applications to large-scale and industrial systems. To give a more nuanced analysis, we distinguish our analysis further based on the domain:
Road
There are a number of non-exhaustive testing from formal models [133, 208] and scenario generation [157] techniques proposed in this domain. Model synthesis from scenarios has also been studied [98]. The applications of different verification techniques such as formal verification based on supervisory control, model checking, and deductive verification have been studied in industrial context [197]. Also in this domain, a framework for validating ethical policies has been developed [229], and human-machine interaction for user interfaces have been validated [231].
Aerial
Apart from applying traditional model checking [103, 201] and simulation [56] techniques to this specific domain, we observe notable attempts to combat the huge state-space of domain-specific models by employing statistical model checking [18] and runtime monitoring [63]. Most papers in this domain have focused on constructing safety models/properties or even coming up with safety specification frameworks [17, 41]; however, energy-efficiency [234] and to a limited extent, security [93], have been addressed as well.
Mobile
The landscape in this domain is much more sparse and scattered. As usual for this category, there are a number of applications of model-checking tools. There is a single paper on test description and test-case generation [176]; also, variability is an under-studied aspect in robotics that has been handled in this context [149] and, finally, there is an industrial case study on the application of model checking [12].
Generic
Formal verification is prevalent in this category; besides parametric verification of multi-agent systems using variations of epistemic logic [8, 22, 24, 121, 135, 187], formal verification using timed automata has also been used in a strategy that composes verification problems into smaller ones [189] and in the verification of path planning [7]. Furthermore, Araiza-Illan, Pipe, and Eder [14] use BDI models and model checking of probabilistic timed automata (in UPPAAL) to generate test sequences for human-robot collaboration tasks. Another use of UPPALL in this category is for model checking of ROS applications that makes use of an ad hoc translation from ROS to UPPAAL [94].
Verification of probabilistic aspects can be found in a few studies. Zhao et al. [235] employ Bayesian inference to estimate the distribution of parameters of Markov chains. Then, they combine formal verification, synthesis, and runtime monitoring to check that the estimated parameters are not violated. Pathak et al. [171] make use of probabilistic properties of Markov chains for self-repair capabilities in robots and tie those into a formal verification process (of PCTL formulae). Araujo, Mota, and Nogueira [15] apply probabilistic model checking to verify whether a robot trajectory (described in terms of an algorithm) satisfies specific behaviours or properties (stated as temporal formulas).
The combination of CSP and FDR can be seen in a few studies. Cavalcanti et al. [46] generate mutations on RoboCart models and feed the mutated CSP [101] (obtained from the RoboChart models) into FDR, which yields a counter-example that is used for testing. Yueng et al. [228] detail a process to contribute to the design of simulation experiments by analysing variations of timing parameters in CSP. They show that the simulation experiment can yield different results in performance and in behavioural terms. Sumida et al. [204] demonstrate a case study in which they model a LEGO robot (EV3) and verify it in FDR for deadlock and livelock.
Several other studies employ different strategies to achieve different goals. Gainer et al. [77] synthesise formal models from control rules of robots in a DSL and input those models into NuSMV for model checking. Doan et al. [64] employ MAUDE to formally verify gathering of robots in a RING network. Santos et al. [192] work on generating unit testing for ROS components and property-based testing for ROS using the Hypothesis tool [154]. Bresolin et al. [39, 40] apply reachability analysis of Hybrid Automata in ARIADNE [26] to analyse the dynamics of surgery robots.
Informal
In this category, there is a clear strength in generic simulation tools and architectures, followed by a strength in simulating road vehicles with domain-specific kinematic models. Some interventions focus on test-case generation and prioritisation as well as runtime monitoring both for generic RAS and for road vehicles. There is a clear weakness in domain-specific intervention for aerial vehicles, where there are only simulation tools for individual and connected vehicles reported in the literature and mobile robots (excluding road vehicles), where no intervention is included in our review. The simulation tools in various domains are often based on a combination of ROS [177] and Gazebo [115], Unity [210] and/or USARSim [45]. We refer to Section 5.3 for further explanation of these tools.
Road
A majority of papers in this category introduce a simulation tool [1, 51, 54, 70, 129, 155, 206, 232] combining vehicle kinematics with other aspects of vehicle modelling such as communications [51], vision-based algorithm [129], and fuel consumption [70]. There are two interventions that use search-based testing [25, 30]. Surrogate modelling, where a higher-level model is used to steer the search, is used in both approaches. Another approach uses past data to identify challenging situations and embed them into test-cases (using an XML structure) [200].
There are also a number of process interventions describing a process for safety assurance [81, 236] and testing Human-Machine Interfaces [162]. Some papers do use a well-defined syntax or a mathematical notation, but are classified as informal; in our classification, if a model does not have a rigorous formal syntax, semantics, and reasoning method, then it is classified as informal. These include using XML as a formal model [200], mathematical descriptions of vehicle kinematics (see simulation tools above), and probabilistic descriptions for risks [236].
Aerial
All interventions reported here concern simulation tools for modelling dynamics and control [42, 139, 161] and communication of aerial vehicles [67]. It is remarkable that these simulation tools rely on entirely different context tools, which will be analysed further in RQ3.
Generic
In this category, the majority of interventions again propose simulation tools [117, 148, 184, 202] or a simulation architecture [224]. (Note that there are two simulation tools that address marine robots [148, 212], but since we did not have a separate class for such robots, we classified it here.) There are, however, a couple of interventions concerning test-case prioritisation [127] and automated unit-test execution [34], and runtime monitoring [102]. The runtime monitoring environment [102] provides an integration with ROS implementations.
Effectiveness
Here, we summarise the notions of effectiveness used in different domains; these notions comprise two sub-categories: the effectiveness of the RAS under test, which is the oracle or the property against which the RAS is tested, and the effectiveness of the testing techniques, which provides an evaluation of the techniques, rather than the system under test:
Road
Concerning the effectiveness measures for road vehicles, collision analysis is the most prominent metric, including analysis on the number of collisions [11, 16, 49, 54, 113, 130, 227], probability of collision [19, 128], and severity [209]. Furthermore, a few studies focus on the analysis of deviation from the intended path in terms of spatial and rotational deviation [30, 32, 50, 78]. As for measures for test adequacy, probability of faulty and rare events [155, 166], and the number of tests generated [208] have been studied; however, these latter measures are not domain-specific and similar measures have been used in other sub-domains.
Aerial
Only four studies have been included in this category. As measures of SUT efficiency, Desai et al. [63] measure obstacle avoidance and plan execution. Similar to road vehicles, studies on probability of completing a task for aerial vehicles are found in the literature [18, 234]. Zhao et al. [234] take a further step by analysing expected mission time and expected number of battery recharges during a mission. Conversely, as a test adequacy metric, Reference [198] measures the accuracy of its simulation.
Mobile
With respect to mobile robots, only a handful of papers’ collective metrics related to effectiveness. As in other sub-domains, collision avoidance [137] and probability of satisfying requirements [37] can also be found here. Brambilla et al. [37] also measures the improvement in the behaviour of a robot by analysing the number of objects retrieved. Arai et al. [12] also consider a similar measure of improvement in the behaviour but in terms of device utilisation. The only metric regarding testing adequacy is number of failures detected studied by Proetzsch et al. [176].
Generic
As a unique measure of effectiveness, Ruijten [188] detects anthropomorphism in its subjects, which is a measure of human likeliness in robots. Several studies focus on analysing the probability of completing a mission successfully [15, 141, 158, 235]. Studies on safety [7, 163, 216] are also commonly found in this category. For instance, Viventini et al. [216] focus their efforts on analysis of hazards, such as the number of hazards identified, number of types of hazard, and number of risk reduction measures taken.
Efficiency
Similar to effectiveness, measures of efficiency can pertain to the system under test (measuring resource usage as a property to be checked or as an oracle for pass and fail) and the testing techniques (to measure the resources used in testing):
Road
Mullins et al. [159] use precision, convergence, and resolution for efficiency in testing. In a number of studies on verification of autonomous systems [71, 72, 110] the size of the state space as well as the total memory footprint [71] in their evaluation is measured to evaluate efficiency. Sun et al. [205], in the verification of finite state models, use abstraction and verification time to estimate the efficiency in their work. Verification time is used in a number of studies [16, 71, 72, 91, 110, 147, 165] to measure efficiency. Gladisch et al. in Reference [87] use simulation time to measure efficiency. Similarly, Bi et al. use simulation time [32] for measuring the efficiency of their work. Fayaz et al. [70] measure test duration in their evaluation. To evaluate the efficiency of the work, Reference [172] measures CPU usage and network bandwidth. Bode et al. [35] measure the cost (euros) of application of their approach as a notion of efficiency. Li et al. in Reference [131] measure computational time in testing and comparing various autonomous vehicle decision and control systems.
Aerial
Sirigineedi et al. [201] use verification time and number of states in Kripke structures, as system model, for measuring efficiency in their work. Urso et al. [67] use simulation time once using Gazibo as notion of efficiency. Zhao et al. [234], in their work that is using prism model checker, consider the expected measure time as a notion of efficiency.
Mobile
Andrews et al. [10] consider the number of states and transitions as a measure of efficiency for the testing method. Arai et al. [12] present results on the response time, throughput, and device utilisation as measures of efficiency for the system under test.
Generic
Verification time is commonly used in a number of studies [77, 136, 141, 171, 182, 189] to measure efficiency. The number of states is another common notion measured in evaluations [15, 77, 122, 135, 182, 228]. Furthermore, testing and simulation time is used to measure efficiency in a set of studies [83, 117, 192]. Althoff et al. in Reference [6] measure the reduced idle time in self-verifying robots. Munawar et al. [160] measure latency and number of steps in simulation of surgical robots. Search time and optimisation time are used by Collet et al. [53] in evaluation of testing single arm robots. Gerstenberg et al. in Reference [83] measure task completion time in simulation. Vicentini et al. in Reference [216] present the total time for risk reduction and hazard identification in their evaluation of formal verification techniques. Probability of task completion, number of executed instructions, and time for completing the task are other measures used in evaluation of model checking autonomous systems [15]. Muhammad et al. [158] measure time and probability of task completion in probabilistic model checking of cooperative robot interaction.
Coverage
Compared to the measures of efficiency and effectiveness, coverage measures are not as widely adopted.
Road
Variations of structural coverage can be found in studies within this sub-domain. Neves et al. [58] developed a tool that conducts post-analysis (based on meta-models) on outputs collected from field testing of autonomous vehicles. Their tool aims to expand test coverage by exploring functionalities in the meta-model that were not covered during the field testing. Majzif et al. [143] devise a process that guarantees coverage of safety standards. They abstract results from component testing and make use of meta-models and situation graphs to compute a system-wide degree of test coverage and derive new scenarios to cover unexplored situations. Tatar [209] presents a method (implemented in TestWeaver [106]) for testing and validation of ADAS systems. The tool generates scenarios to cover relevant system states and feeds back previous executions to guide the next round of testing.
Aerial
In the only paper in this category, Bicevskis, Gaujen, and Kalnins [33] developed new methods for testing and validation of autonomous processes collaboration. They build a collaboration model using an extended finite state machine and employ symbolic execution and feasibility tree analysis to check that all relevant states can be reached in the model. They have evaluated their strategy using a UAV case study.
Generic
Tian et al. [211] propose DeepTest, a tool for testing Deep Neural Networks in autonomous vehicles. It generates tests that explore different parts of the DNN logic with the goal of maximising neuron coverage. Araiza-Illan, Pipe, and Eder [14] propose a methodology for generating test cases that achieve high code coverage in human-robot collaborative tasks. They developed a testbench for ROS that makes use of belief-desire-intention (BDI) agents to generate valid and human-like tests. Structural coverage of Petri nets models have been utilised by Sagglietti in different contexts, such as in the generation of test cases for autonomous agents [132] and the verification of reconfiguration behaviour of autonomous agents [190, 191].
Open-source
Road
There are a handful of open-source tools for formal verification [16], testing [54], and simulation [4, 80, 211] of autonomous vehicles. Testing and simulation seem to be gaining some strength with respect to open-source tools, and we see tools for scenario generation, testing, and simulation of autonomous vehicles within various traffic scenarios. For connected vehicles, VerifCar [16] is a framework based on timed automata and is dedicated to modelling and verifying and validating connected autonomous vehicles policies.
Garzón and Spalanzani [80] present a tool that combines 3D simulation (for ego-vehicle control) with a traffic simulator (which controls the behaviour of other vehicles). The goal is to test the ego-vehicle in realistic high-traffic situations. The FLPolyVF tool [4] connects functional verification, sensor verification, diagnostics, and industry/regulatory communication of autonomous vehicles while checking the effects of using different scenario abstraction levels. The MoVE tool [54] provides the possibility of modelling pedestrian behaviour. The framework focuses on testing autonomous system algorithms, vehicles, and their interactions with real and simulated vehicles and pedestrians.
Gambi, Mueller, and Fraser present the AsFault prototype tool [78]. The tool combines procedural content generation and search-based testing to automatically create challenging virtual scenarios for testing self-driving car software. Tian et al. [211] propose DeepTest, a tool for testing Deep Neural Networks in autonomous vehicles. It generates tests that explore different parts of the DNN logic with the goal of maximising neutron coverage
Aerial
All new tools reported here concern simulation tools for modelling dynamics and control [18, 139] and communication of aerial vehicles [67].
Lugo-Cárdenas, Luzano, and Flores [139] introduce a 3D simulation tool for UAVs whose focus is on assisting the development of flight controllers. Analogously, D’Urso, Santoro, and Santoro [67] also present a simulator for UAVs, called GzUAVChannel. The framework combines Gazebo, Autopilot, and NS-3 network simulator to provide a 3D visualisation engine, a physics simulator, a flight control stack, and a network simulator to handle communications among unmanned aerial vehicles. On the stochastic side of software verification, Bao et al. [18] present a prototype tool for parametric statistical model checking that can cope with complex parametric Markov chains where state-of-the-art tools (such as PRISM) have timed out. They provide evidence of their tool efficiency by conducting an industrial case study.
Generic
Several open-source tools have been proposed in this category, with a majority of them being simulators. The only exceptions are a formal verification tool [175] for human-robot interactions and two runtime verification tools [63, 102].
Rohmer, Singh, and Freese introduce VREP [184] a popular robotics physics simulator that is now known as CoppeliaSIM. The tool uses a kinematics engine and several physics libraries to provide rigid body simulations (including meshes, joints, and multiple types of sensors). Brambilla et al. have developed ARGOS [37], which is a multi-physics robot simulator that can simulate large-scale swarms and can be customised via plug-ins. In the Matlab environment, the FROST tool [100] is an open-source Matlab toolkit for modeling, trajectory optimisation, and simulation of robots, with a particular focus in dynamic locomotion. Munawar and Fischer [160] present the Asynchronous Framework, which incorporates real-time dynamic simulation and interfaces with learning agents to train and potentially allow for the execution of shared sub-tasks.
For underwater robots, three new tools have been introduced: Manhaes and Rauschenbach present UUV simulator [148], which is an extension of Gazebo accommodating the domain-specific aspects of underwater vehicles. Cieslak et al. introduce Stonefish, a geometry-based simulator [52] that can be integrated with ROS. Last, the MARS [212] tool provides simulation environments for marine swarm robots. As for tools in the human-robot interaction (HRI) domain, RoVer [175], provides visual authoring of HRI, formalisation of properties in temporal logic, and verification that the interactions abide by a set of social norms and task expectations.
Huang et al. present ROSRV [102], which is a runtime verification framework that can be used with ROS. Desai et al. [63] present a runtime verification framework based on Signal Temporal Logic [144], where an online monitor checks robustness on partial trajectories from low-level controllers (in the context of surgical robots).
Public
The tools included in this category are diverse; we report below on a test-scenario generation tool for road vehicles [159], two simulation tools for aerial [198] and generic [233] robots, and a formal verification tool for generic robots (applied to a UAV case study) [46, 47, 48, 156].
Road
Mullins et al. [159] developed a tool (RAPT - Range Adversarial Planning Tool) for generating test scenarios. The tool employs an adaptive search method that generates new scenarios based on the performance and results of the previous one. A clustering algorithm ranks the scenarios based on the performance type and how close they are to the boundaries of each cluster. The boundaries are based on notions of efficiency, diversity, and scaling.
Aerial
Shah et al. [198] introduce the AirSim simulator that generates training data for building machine learning models used in autonomous air crafts. It offers physical and visual simulation, including models of physics engine, vehicle, environment, and sensors.
Generic
There are two tools included in this category: a formal specification and verification tool [46, 47, 48, 156] and a simulation tool [233]. Cavalcanti et al. [46, 47, 48, 156] introduce RoboTool, supporting graphical modelling, validation, and model checking (via FDR [85]) of robotic models written in RoboChart [156] and RoboSim [48]. Zhang et al. [233] introduce CyberEarth, a framework for program-driven simulation, visualisation, and monitoring of robots. The tool integrates modules from several other open-source tool such as ROS [177] and OpenSceneGraph (OSG) [43].
Proprietary
Road
The three tools included in this category comprise two tools for formal analysis [98, 165] and a simulation tool [232], each of which are explained further below.
Heitmeyer and Leonard [98] introduce two tools integrated into the FORMAL framework; The tools synthesise and validate formal models. The first tool synthesises a formal Software Cost Reduction (SCR) requirements model from scenarios, and the second tool combines the existing SCR simulator [96] with eBotworks 3D simulator to allow for simulation of continuous components.
O’Kelly introduces APEX [165], which is a tool for formally verifying the trajectory planning and tracking stacks of ADAS in vehicles. Zhang et al. present RoadView [232], a photo-realistic simulator that tests performance of autonomous vehicles and evaluates their self-driving tasks.
Generic
The three tools included in this category are diverse and range from simulation [215] to formal verification [75] to model-based testing [53].
Verma et al. [215] present a Flight Software simulator that is used to simulate MARS Rover missions. The simulator assists in predicting the behaviour of semi-autonomous systems by providing the capability for human operators to check if their intent is correctly captured by the robot prior to execution in different scenarios and environments. Foughali et al. [75] implement an automatic translation from GenoM [145], a robotics model-based software engineering framework, to the formal specification language Fiacre [28], which can be fed into TINA [29] for formal verification. Collet et al. [53] introduce RobTest, a tool for generating collision-free trajectories for stress testing of single-arm robots. It employs constraint programming techniques to solve continuous domain constraints in its trajectory generation process.
Small
A considerable number of studies, among those included in this survey, consider small case studies in their experiments. Here, we review the most prominent ones.
Road
A few papers employ case studies with a focus on collision avoidance for road vehicles [4, 49, 71, 128]. They all focus on detecting imminent collision using built-in sensors. Similarly, Gaurehof, Munk, and Burton [81] conduct a case study where a machine learning function detects pedestrians using a video analysis.
Several case studies concentrate on driving scenarios and manoeuvres, such as lane changing [19, 165], lane and path following [11, 232], merging [80], roundabouts [227], traffic scenarios [131, 155], parking [50], and overall cruise control [72, 110, 130, 147, 217]. A focus on the actual decision-making and path planning can be seen in a handful of the case studies [50, 51, 226] as well. Hardware-in-the-loop simulations [95] and human-machine interaction for driving assistance [231] can also be seen among the included case studies.
Aerial
The small-scale case studies in this domain only present theoretical or very limited models of UAVs, such as a surveillance drone [63] or a model for UAV launch [93]. Moreover, Brunel et al. [41] conduct a safety case analysis, while Bu et al. [42] explore simulation and realistic testing of vision-based object tracking for UAVs. Aerial drones in co-operative scenarios are also the subject of two case studies [103, 201].
Mobile
Only two small-scale case studies were included here: Lu et al. [137] make use of PRISM model checker to investigate three collision avoidance algorithms in an unmanned surface vehicle model with a dynamic intruder, and Arai and Schlingloff [12] employ a model checking technique on a transport robot model.
Generic
Several case studies can be found within this category, with the vast majority being small models that are applied to demonstrate the respective intervention. Here, we briefly present and discuss some of them.
Walter, Täubig, and Lüth [218] provide an algorithm that increases safety through formal verification using the theorem prover Isabelle; the case study is a small robot. Nguyen et al. [163] provide a multi-step process to verify correctness of autonomous agents and apply it to a cleaner robot. Fu and Drabo [230] model a humanoid robot in an extension of Petri nets (called Predicate Transition Reconfigurable Nets - PrTR Nets) and formally verify it. Lill et al. [132] also make use of Petri nets, however, they develop models of cooperative forklifts and simulate scenarios where the robots decide which one has the priority when passing through narrow pathways.
Farulla and Lamprecht [69] conduct a case study on human-robot interaction processes that have been modelled in DIME and show how they can be verified with the GEAR model checker. Zhang et al. [233] have built a virtual simulation platform, CyberEarth, for robotics and cyber-physical systems. A visual coverage task for UAVs is also introduced to demonstrate the platform. Dennis and Fisher [62] apply an agent verification approach to verify the correctness of an agents ethical decision-making. Doan, Bonnet, and Ogata [64] specify and formally verify, using the model checker Maude, a robotic gathering model.
Industrial
Road
The industrial case studies involving road vehicles included in our survey typically involve verifying specific components of such systems.
In the context of advanced driver assistance systems (ADAS), Abdessalem et al. [25] generate test cases for such a system that can visually detect pedestrians. Zhou et al. [237] introduce a framework for virtual testing of advanced driver assistant systems that uses real world measurements. Kluck et al. [113] consider virtual driving scenarios for testing automated emergency breaks. AbdElSalam et al. [1] use Hardware Emulation-in-the-loop to verify Electronic Control Units (ECUs) for ADAS systems.
Fayazi, Vahidi, and Luckow [70] implement a vehicle-in-the-loop verification environment and conduct field testing in the International Transportation Innovation Center (ITIC).
Gladisch et al. [87] select case studies that use industrial automated driving (adaptive cruise control, lane keeping, and steering control scenarios) to evaluate their search-based testing strategy. Abdessalem et al. [2] generate test cases for the SafeDrive system, which contains the following four self-driving features: autonomous cruise control, traffic sign recognition, pedestrian protection, and automated emergency braking.
Aerial
Shah et al. [198] build a model of quadrotor with pixhawk controller in their newly developed simulator, AirSim, that includes a physics engine and supports real-time hardware-in-the-loop. Rooker et al. [185] demonstrate their validation framework for autonomous systems in a farming context with simulations and field testing. They employ both UAVs and ground mobile systems. Bhattacharyya et al. [31] apply formal verification methods to an autonomous CoPilot agent.
Mobile
The only study in this category is the study conducted by Rooker et al. [185], which is also mentioned above. In summary, they demonstrate their validation framework for autonomous systems in a farming context with simulations and field testing.
Generic
Jacoff et al. [105] conduct field testing for the performance evaluation of robots used in disaster scenarios. Verma et al. [215] present a Flight Software simulator that is used to simulate MARS Rover missions. They demonstrate their approach with a case study. Satoh [194] conducts a case study using a physical transport robot to demonstrate their framework that can emulate the robot’s physical mobility.
Manhaes and Rauschenbach [148] model the Sperre SF 30k ROV underwater robot (RexROV) in the demonstration of the simulator for unmanned underwater vehicles. Uriagereka et al. [214] conduct simulation-assisted fault injection to assess safety and reliability of robotic systems. The feasibility of their method is demonstrated by applying it to the design of a real-time cartesian impedance control system. Gainer et al. [77] conduct a case study in the context of verification of human-robot interaction using the Care-O-Bot robotic assistant.
Benchmark
Road
Many different case studies have been included in this category. We briefly discuss the most distinguished ones. For instance, Neves et al. [58] developed a tool that conducts post-analysis (based on meta-models) on outputs collected from field testing of autonomous vehicles. Five field testings involving a program to control the navigation of an autonomous vehicle, CaRINA I, were performed. Zofka et al. [238] present the framework Sleepwalker for verifying and validating autonomous vehicles and demonstrate the benefits of their framework using different instances stimulating an autonomous vehicle.
Mullins et al. [159] have developed a tool (RAPT - Range Adversarial Planning Tool) for generating test scenarios to be employed on the System Under Test. Their tool is applied to realistic underwater missions. Heitmeyer et al. [98] synthesise software cost-reduction models of multiple autonomous systems to be used in a simulator integrated with the eBotworks simulation tool. Gruber and Althoff [91] present a reachability analysis tool (Spot) that finds counter-example to property violations. Their tool is evaluated using the CommonRoad benchmark PM1:MW1:DEU_Muc-3_1_T-1.
Pereira et al. [172] employ several small case studies in their attempt to couple two simulators, namely, SUMO and USAR-Sim. Pasareanu, Gopinath, and Yu [170] present a compositional approach for the verification of autonomous systems and apply the technique on a neural network implementation of a controller for a collision avoidance system on the ACAS Xu unmanned aircraft. Bi et al. [32] present a deep Learning-based framework for traffic simulation and execute several scenarios of intersections with and without pedestrians.
Aerial
Not many studies have applied their intervention to benchmarks of aerial systems. Bicevskis, Gaujens, and Kalnins [33] develop models for the testing of UAV and UGV collaboration in the Simulink environment. Mutter et al. [161] also explore the simulation of UAV models in Simulink and discuss the results when combining the platform and environment models. D’Urso, Santoro, and Santoro [67] simulate leader-follower UAV scenarios in their framework. Their goal is to combine four simulation environments: a 3D visualisation engine, a physics simulator, a flight control stack, and a network simulator.
Wang and Cheng [219] present a hardware-in-the-loop simulator for drones that can generate synthetic images from the scene as datasets, detect and verify objects with a trained neural network, and generate point cloud data for model validation. They simulate and conduct field testing on a physical UAV. Zhao et al. [234] model an unmanned aerial vehicle (UAV) inspection mission on a wind farm and, via probabilistic model checking in PRISM, show how the battery features may affect verification results.
Mobile
Two studies fit this category. Proetzsch et al. [176] use a designed DSL (graph-based models) to describe the system behaviour of the autonomous off-road robot RAVON. The model is used as test model for generating test cases. Brambilla et al. [37] model a probabilistic swarm that is checked in PRISM to evaluate their property-driven design method.
Generic
Several studies have applied their intervention to benchmark case studies of generic/immobile robots. We briefly discuss the most distinguished ones.
Tosum et al. [213] present a design framework that facilitates the rapid creation of configurations and behaviours for modular robots. They have demonstrated their framework on the SMORES robot. Halder et al. [94] use the physical robot Kobuki as a case study, over which properties are automatically verified using the UPPAAL model checker. The focus of their approach is to model and verify ROS systems using real-time properties. Laval, Fabrese, and Bouraqadi [127] introduce a methodology to support the definition of repeatable, reusable, semi-automated tests and apply it to a two-wheel differential drive robot.
Bohlmann, Klinger, and Szczerbicka [36] automatically generate a model of a parallel delta robot on-the-fly. Their method for model generation is based on machine learning and symbiotic simulation techniques. Mariager et al. [150] design and field-test a robot that interacts with adolescents with cerebral palsy. Althoff et al. [6] propose a framework (IMPROV) for self-programming and self-verification for robots, which is demonstrated on a physical robotic arm. Wingand et al. [224] have developed CoSiMA, which is an architecture for simulation, execution, and analysis of robotics system. They conduct experiments on the humanoid robot COMAN.
In Table 8, we map the identified subdomains to the different aspects of our research questions as follows:
RQ1
Across all subdomains, a majority of models have been formal and quantitative, and substantial gaps can be detected (most notably in the aerial vehicles and mobile robots subdomain) regarding using qualitative and informal models for testing.
RQ2
Across all studied subdomains, there is a clear gap in using precise notions of effectiveness, efficiency, and coverage. Among these, some generic notions of effectiveness and efficiency (such as testing time and state space size) and the notion of coverage (such as node and transition coverage) are the most-used measure for quantifying the effect. Common, more sophisticated measures of effectiveness, efficiency, and adequacy such as Average Percentage of Faults Detected (APFD) [186] do not seem to have been adopted in or extended to the domain of RAS. We do see some recent trend towards domain-specific notions of effectiveness and coverage [2, 25, 33, 35, 112, 143, 211]; almost all of these notions have been applied to the autonomous vehicles domain, but most of them can be adapted to be applicable to other domains as well.
RQ3
There is a considerable gap concerning tool support for testing RAS. There are very few open source tools, mostly in the autonomous vehicles [4, 16, 54, 78, 80, 91, 211] and aerial vehicles [18, 67, 139] subdomains. No open-source tools support the domain-specific aspects of mobile robotic system. The same pattern with a more severe gap is present for proprietary tools. Very few public (but not open source) tools are developed or used in the reviewed literature.
RQ4
There is also a very severe gap across all subdomains in using industrial case studies for evaluating RAS testing interventions. The most notable exceptions are a handful of case studies, mostly in the autonomous vehicles [1, 2, 25, 70, 87, 113, 195, 197, 208, 237] and the aerial vehicles [77, 105, 148, 153, 194, 214, 215] sub-domains, performed in an industrial context. Many interventions used small case studies, mostly without any specific application subdomain (e.g., using generic models of mobile robots); in these cases, the models did not contain enough details to be part of a general benchmark. There have also been some evaluations performed on small case studies based on drones and UAVs.
Analysis for Researchers.
Gaps: In our analysis of the studied subdomains, there is a clear gap in treating marine and sub-marine RAS. Also, there is a relative weakness in treating aerial vehicles and mobile robots. Moreover, there is a relative weakness across subdomains concerning the treatment of informal and qualitative models. Developing a common set of notions of effectiveness and efficiency to compare different interventions is a worthwhile research challenge, and there is a gap in the literature in tailoring them for specific domains. The same observation holds for the notion of test adequacy. Tooling, particularly tailored for specific subdomains, is a general weakness for all interventions. Moreover, applying the interventions in industrial context is an outstanding challenge.
Strengths: The road vehicle subdomain has considerable strengths across all research questions. Also, there are far more interventions that have been developed for generic RAS systems without treating the specific concerns of sub-domains. Formal and quantitative models are by far the strongest interventions both in terms of the number of techniques studies and the evaluations performed, even in industrial domains.
Analysis for Practitioners.
Gaps: Since most of the proposed interventions have not been evaluated in industrial context, evaluating their applicability, including studying factors such as the learning curve and training, remains a substantial gap.
Strengths: Due to the available strength in formal and quantitative models, developing such models provides a starting point to benefit from the developed and studied interventions. There is certainly more maturity in the area of road vehicles to benefit from in practice, but we can envisage that by tuning the domain-specific aspects; other sub-domains may also benefit from these strengths.

6.1.2 Cooperation and Connectivity.

Verification methods are pivotal for the widespread deployment and public acceptance of autonomous systems. The need for such methods is intensified in the functions enabled by network services, due to the close interaction among the communication protocols, control software (e.g., for cooperation rules), and system dynamics. Existing (manual) analysis techniques typically do not scale to the huge design-space and input-space of these functions and, hence, in this work, we survey automated verification techniques found in the literature.
Table 9 provides an overview of the interventions used to test cooperation and connectivity in RAS. The interventions can be broadly categorised into swarm RAS, where an emerging behaviour is to be observed through cooperation of a large number of RAS, versus cooperative RAS, where few RAS units engage in a well-defined interaction (possibly with their environment) to achieve a goal.
Table 9.
Table 9. Testing Cooperation and Connectivity in RAS
In general, this turns out to be an understudied area of testing RAS, and little focus has been put in testing cooperative and connected scenarios in the literature. For the very few interventions reported in the literature, there is scarcely any evidence of efficiency or effectiveness available. The handful of reported evaluations are only performed on small-scale case studies and are not accompanied by open source tools. In our analysis, we focused on cooperation among robots; however, only in 2019, we encountered some papers that study cooperation from a human-robot interaction viewpoint [6, 150, 188].
Overall, there is very little about the stochastic details of communication protocols. The studies in this category mostly focus on verification of movement of robots (i.e., gathering and merging).
Qualitative
Swarm
With respect to swarms, a number of theoretical studies [119, 120, 121] focus on scaling up the parametrised model checking problem to large swarm sizes. They employ various types of epistemic extensions of CTL as property specification languages. Their models include case studies on clustering of swarms, which synchronise to gather in a certain area. Cybulski et al.’s contribution [56] to the field is a simulation framework for the behaviour of UAV swarms. The framework also allows for performing simulations with a user-defined map of the environment.
Cooperative
Regarding the use of qualitative models of cooperative systems, our search only resulted in three studies, two of which employ variations of Petri nets as their models. Lill et al. [132] make use of Petri nets to develop models of cooperative forklifts. The forklifts communicate to decide which has the priority when passing through narrow pathways. Sagglieti et al. [191] employ Coloured Petri nets and classify cooperation in three distinct levels: perception-based, reasoning-based, and action-based cooperation. To demonstrate their strategy, they model platooning-like scenarios where different robots follow each other. The third study, by Doan et al. [64], is a more theoretical study of parametric model checking that employs model checking for multiple small robots gathering in circular configurations using a ring-based topology. Their study focuses on model checking the underlying distributed system and the properties written in LTL.
Quantitative
Swarm
Three out of four papers in this category deal with probabilistic behaviour: Lomuscio et al. [135] perform model checking on probabilistic LTL, while Anmin et al. [7] and Brambilla et al. [37] both describe properties in PCTL. Cavalcanti et al. [47], however, models timed dynamics in CSP.
Cooperative
The studies that have been classified in this category employ variations of temporal logic as their properties, such as LTL [103, 107, 158] and CTL [16]. As an exception to that list, Bicevskis et al. [33] provide a simulation environment in Simulink.
Formal
Swarm
Kouvaros et al. [119, 120, 121] provide several theoretical studies in the field of formal verification and model checking of autonomous systems. They have demonstrated the applicability of their strategy on a case study of gathering of UAVs swarms. With respect to probabilistic systems, three studies have been found: Lomuscio et al. [135] offer a strategy for parameterised model checking of probabilistic LTLs. Furthermore, Brambilla et al. [37] provide a property-driven design method for probabilistic swarms that is checked using Prism. Last, Amin et al. [7] verify probabilistic behaviour expressed via PCTL properties using UPPAAL. They assert deadlock freedom, safety liveness checks, and reachability computations.
Cooperative
The majority of studies employing formal methods in analysing RAS used model checking [16, 33, 64, 103, 107, 158, 158]; after model checking, the most frequently used technique is model-based testing [132, 190, 191]. Arcile et al. [16] and Kamali et al. [107] investigate car platooning manoeuvres. In the former, the vehicles are modelled as timed automata, and UPPAAL is used as a model-checking tool. In the latter, joining and exiting operations are modelled in Belief Desire Intention models and model-checked using AJPF. They focus on abstracting a formal (untimed) model from an agent (timed) model and checking the correspondence between the agent model and the code. With respect to probabilistic model-checking, Muhammad et al. [158] model robots that synchronise to position themselves in an attempt to guarantee coverage of a certain area. The models are Markov Decision Processes that are checked using PRISM. Humprey et al. [103] make use of the model checker Spin to investigate cooperation between UAVs and sensors and collaboration among sensors as well.
Informal
Cooperative
The only study in this category [67] provides a simulation environment that integrates existing solutions for simulation of multi-UAV applications such as Gazebo (for robotic simulation), ArduPilopt (for UAV control algorithms), and NS3 (for network simulation). Their case study is a model of a leader-follower application for large convoys of UAVs.
Effectiveness
As noted before, there are very few measures of effectiveness used for evaluating the system or the testing technique:
Swarm
Amin et al. [7] use very generic notions of effectiveness for the system under test, namely, deadlock freedom, safety, liveness, and reachability. Brambilla et al. [37] go beyond that and in addition to some domain-agnostic properties, such as probability of satisfying the requirements (safety), they measure aggregation time of the swarm and improvement of the behaviour (in terms of objects retrieved).
Cooperative
Muhammad et al. [158] measure the probability of task completion and human interaction to measure efficiency in wireless sensor networks. Arcile et al. [16] measure the number of collisions in its vehicle verification approach.
Efficiency
Swarm
Lomuscio et al. [135] count the number of sates and transitions as a measure of efficiency of their formal verification methodology.
Cooperative
Muhammad et al. [158] also measure the time of task completion and human interaction as a measure of efficiency for the system under test. Arcile et al. [16] measure the travel time for the system under test and the verification time as efficiency metrics for the respective testing technique. D’Urso et al. [67] measure simulation time as a measure of efficiency of their integrated simulator.
Coverage
Cooperative
Saglietti, Winzinger, and Lill [191] consider state coverage in the analysis of their model-based reconfiguration testing strategy, while Bicevkis et al. [33] conduct testing of collaborative UAV and UGV and consider “the complete test set” as a measure test coverage. Lill and Saglietti [132] model Petri nets entities and address the maximisation of interaction coverage while minimising the amount of test cases.
Open-source
Swarm
With respect to swarms, two open-source tools have been reported: the PSV-CA tool [135] can model-check probabilistic PLTL properties for swarm systems, and ARGOS [37] is a multi-physics robot simulator that can simulate large-scale swarms and can be customised via plug-ins.
Cooperative
Only two open-source tools have been found for cooperative robots: VerifCar [16] allows for fault injection in models for UPPAAL model checking. GzUAV [67], meanwhile, is a simulation tool for connected UAVs.
Public
Swarm
The only public, non-open source tool that as been employed in the testing of swarms is the RoboTool by Cavalcanti et al. [48]; RoboTool supports modelling and model checking (through the FDR tool [85]). The tool has been applied to a UAV swarm case study.
Small
Swarm
Kouvaros and Lomuscio [119, 120] study parameterised verification of robot swarms against temporal-epistemic specifications and model a small, theoretical, robot swarm. Cavalcanti et al. [47] introduce Robochart, which allows for modelling and verification of interacting robots. Cybulski [56] provides mathematical models of a UAV swarm that can be simulated in their proposed framework. Amin et al. [7] present a formal verification approach using timed automata for the verification of path planning of robot swarms.
Cooperative
Poncela and Aguayo-Torres [174] conduct a case study where they test underwater robots’ wireless communication. Lill et al. [132] make use of Petri nets to develop models of cooperative forklifts and simulate scenarios where the robots decide which one has the priority when passing through narrow pathways. Humprey et al. [103] make use of the model checker Spin to investigate cooperation between UAVs and sensors and collaboration among sensors as well. Saglietti, Winzinger, and Lill [191] use coloured Petri nets to model interacting autonomous agents and generate test cases for reconfiguration scenarios.
Benchmarks
Swarm
The only reported benchmark for this category is by Brambilla et al. [37], where they investigate aggregation and foraging manoeuvres on large-scale swarms of multiple sizes.
Cooperative
D’Usrso et al. [67] evaluate their methodology on a number of test programs using different UAV sizes. They aim their evaluation at testing (i) the scalability of the solution and (ii) its performances by comparing the simulation time with respect to physical execution time.
Industrial
Cooperative
The only reported industrial case study is by Rooker et al. [185]. They make use of a simulation tool in the smart farming domain. Land and air robots are modelled using real dynamics and cooperate to complete farming tasks.
RQ1
Regarding the models used for analysing cooperative scenarios in RAS, we notice that formal probabilistic models (based on variations of temporal logic [119, 120, 121], process algebra [48], and timed automata [7, 16]) are the most-used types of models. Often these models are used for the purpose of model checking abstract models of cooperative scenarios. Informal models are used less frequently, and only as input to simulation tools [67]. Qualitative and informal models are used far less often in this context.
RQ2
Most notions of effectiveness and efficiency are the generic notions such as state-space size, verification time, and test coverage [33, 132] for the technique and deadlock freedom and probability of temporal logic formula satisfaction for the system under test. The only exceptions where domain-specific notions of efficiency and effectiveness were used concern aggregation time of the swarm [37] and effectiveness of human-robot interactions [158].
RQ3
There is clearly a lack of tools for testing cooperative and swarm scenarios in RAS. The only exceptions are public model checking tools [16, 37, 48, 135] and a simulation tool for connected UAVs [67].
RQ4
Very few studies have evaluated their interventions on industrial-scale case studies [185] and benchmarks [33, 37, 67].
Analysis for Researchers.
Gaps: An analysis of the included studies reveals that in cooperative scenarios for RAS, the role of communication networks and protocols and their effect on functionality, safety, and reliability of the RAS system is severely understudied. Integrating the body of knowledge available in communications with the testing and verification of RAS is clearly an area for future research. The very few available studies do not provide domain-specific measures of efficiency and effectiveness that pertain to the cooperative aspects and the emerging cooperative behaviour. Moreover, there is a lack of sufficient evidence of strategies being applied to industrial-scale case studies and benchmarks.
Strengths: There is certainly a strength in abstract theories for parameterised model checking of swarms. Apart from that, there is no other concentrated area of strength.
Analysis for Practitioners.
Gaps: As noted above, we do not think we have reached sufficient maturity in the research results for cooperative and swarm robots to be able to apply them in practice. Even the existing techniques have not been applied to many industrial case studies yet, and no stable tool-sets are available at the moment. Working with researchers to define meaningful notions of efficiency and effectiveness as well as providing benchmarks and industrial case studies could lead to an impactful future research agenda.
Strengths: There are no practical areas of strength in testing cooperative and swarm RAS scenarios.

6.1.3 Testing Strategy.

Table 10 provides an overview of the testing strategies used for RAS. By far, the most widely used strategy is formal verification, followed by simulation and runtime monitoring, respectively. Model-based testing is the least-researched strategy.
Table 10.
Table 10. Overview of the Testing Strategies Used for RAS
Qualitative
Simulation
Heitmeyer et al. [98] synthesise state-based formal models (Software Cost Reduction tabular models) from scenarios specified in Mode diagrams (extensions of Message Sequence Charts). The models are used in a simulator integrated with the eBotworks simulation tool. Cybulski et al. [56] developed a simulation tool for UAV based on class- and activities diagrams. Further, their framework allows for user-defined maps of the environment.
MBT
Two search-based testing approaches are employed in this category: Lill and Saglietti [132] employ genetic algorithm to maximise coverage in the Petri nets models for generating test cases. Analogously, Nguyen et al. [163] provide a multi-step process to verify correctness of autonomous agents. They make use of multi-objective evolutionary algorithms to cover stakeholder soft goals. Araiza et al. [14] and Andrews et al. [10] focus on human robot interaction. The former generate test cases from BDI models, while the latter focus on coverage to generate test cases from Petri nets. Another model-based testing tool that uses Petri nets is Reference [191], which uses Coloured Petri nets and structural coverage metrics to generate test cases for reconfiguration scenarios. Finally, Hagerman et al. [93] combine a behavioural model with attack and mitigation analyses to generate a security test suite for UAVs.
Formal Verification
The vast majority of papers in this category perform formal verification based on properties specified in variations of temporal logic; since autonomous systems specifications typically involve aspects such as beliefs and intentions, several studies are dedicated to studying the theoretical boundaries (e.g., (un)decidability) of verifying epistemic extensions of temporal logics [119, 120, 121, 122, 134, 136]; a notable tool used in this context is the MCMAS model checker [136], which is also evaluated on a small-scale benchmark against the general-purpose model checker NuSMV.
Many theoretical studies study the issue of abstraction for parameterised specifications, where parameters can be the number of autonomous agents [119, 120, 121, 122, 134, 136] or the size and shape of the arena [8, 9]. Aminof et al. [9] investigate the decidability problem for parameterised grid sizes. They found that restricting the grid size results in the problem being solved in Pspace. In the same vein, Rubin et al. [8] establish a framework in which to model and automatically verify autonomous agents. The framework contains an algorithm tailored to solve a parameterised verification problem where they use the model graphs as parameter.
Coming up with temporal logic specifications is known to be difficult and requires some level of formal training. To alleviate this, a few papers focus on writing LTL properties. Webster et al. [221, 222] model scenarios for a robot in the healthcare sector; they use Brahms as the language to describe human-robot interaction scenarios, and the properties are written in LTL. Babiceanu et al. [17] combine LTL and Event-B to build models of trustworthiness for small unmanned aerial systems (sUAS).
Formalisation of and applying formal verification on different cognitive architectures have been the focus of many studies.
Bhattacharyya [31] formalise a rule-based representation of cognitive architecture using Soar framework [124] UPPAAL and connect the verification agent to the simulation environment. They model an auto-pilot avionic system and analyse contingency situations during takeoff.
The Belief-Desire Intention framework is another natural cognitive architecture for specifying autonomous agents, and it has been extensively used in the literature. Several studies make use of the MCAPL framework [61]. Dennis et al. [60, 62] verify ethical aspects of autonomous agents’ interactions with people by modelling their behaviour using the BDI models and capturing ethical priorities; these ethical models are subsequently model-checked against LTL specifications using the MCAPL framework. Furthermore, Ferrandes et al. [71] model autonomous vehicle components and also use MCAPL to formally verify their BDI models. Last, Ferrando et al. [72] go further and provide an approach that combines formal verification and runtime monitoring by specifying trace behaviour in Prolog and connecting that with a JAVA implementation (using the JPL framework3) for runtime monitoring.
Sun et al. [205] study the effect of neural network components in the behaviour of autonomous systems; in particular, Rectified Nonlineary in Neural Networks, and analyse it using Satisfiability Modulo Convex (SMC). To mitigate the verification effort, they perform a pre-processing and evaluate the effect of pre-processing on the verification time, measured in the number of neurons in the Neural Network.
With respect to the verification of safety on human-robot interaction, Vicentini et al. [216] provide safety assessments by formally verifying models written in TRIO temporal logic [84]. Their strategy aims to identify hazardous situations associated with non-negligible risks. Analogously, Farulla and Lampretc [69] focus on model checking security properties formulated in Computational Tree Logic (CTL).
Selvaraj et al. [197] evaluate the application of different formal techniques to verify control software for an autonomous vehicle. They investigate the application of Supervisory Control Theory, Model Checking, and Deductive Verification and provide insights on how these different approaches can address different industrial challenges.
Quantitative
Simulation
Most interventions in this category focus strictly on introducing a simulation tool for a specific sub-domain, such as underwater robots [52, 148, 212], vehicles [54, 200, 232], robots [100, 126], and UAVs [139, 161].
However, a number of interventions combine a simulation approach with other testing aspects. Li et al. [131] employ a game-theoretical approach where vehicles have different levels of knowledge about other vehicles. For instance, a level 0 car has no knowledge about the other cars, and a level k car has information about level k - 1 cars. Strangely, they show that, in some instances, lower-level cars effect less constraint violations.
Szalay et al. [206] provide a scenario-in-the-loop simulation using SUMO and Unity engine. They simulate simplified platooning and valet parking scenarios in both the simulation and in a real smart city (Salazone).
Verma et al. [215] present a Flight Software simulator that is used to simulate MARS Rover missions. The simulator assists in predicting the behaviour of semi-autonomous systems by providing the capability for human operators to check if their intent is correctly captured by the robot prior to execution in different scenarios and environments.
Two studies present supporting libraries: Koolen et al. [117] implement robotic simulation library in the Julia programming language. The library offers support for robot dynamics, visualisation, and control algorithms. Rohmer et al. [184] developed libraries to integrate VREP and other programming languages (Lua, C++, Java, Python, Matlab and Ruby) with support for different types of 3D objects and modules for kinematics and dynamics.
Model-based Testing
Multi-objective search is an increasingly popular technique for coping with complex robotics systems. Betts et al. [30] employ the Monte Carlo search heuristic to verify the lateral distance between the outcome of surrogate-based models compared to a known ground truth in UAV applications. A similar approach is used by Nejati et al. [25] on pedestrian detection using vision-based system. They employ NSGA-II [59] using minimum distance and minimum time to collision as fitness functions and compare the performance of the heuristic with and without surrogates.
Sagliatelli and Meitner [190, 191], however, propose multiple notions of coverage to help generate test cases from Petri nets models of autonomous, cooperative, and reconfigurable robots. Furthermore, they employ statistical testing techniques, which intend to evaluate the degree of acceptance of the behaviour observed.
A framework for automated testing using metamorphic testing principles combined with model-based testing is employed by Lindvall et al. [133]. Test cases are generated from test models and multiple variations of scenarios that are programmatically generated based on metamorphic relations.
Runtime Monitoring
In this category, verification checks at runtime have been typically coupled with model checking strategies, where properties are being checked during the system execution. For instance, Desai et al. [63] present an STL-based framework where assumptions used during model checking hold at runtime, where the online monitor checks robustness on partial trajectories from low-level controllers.
In the context of obstacle avoidance, Luo et al. [140] employ JavaMOP to verify that a robot does not behave against requirements written in FSM and PTLT languages. Temporal properties are also employed by Wang et al. [220], where the RoboticSpec specification language for robotic applications is translated into a framework for online monitoring that also uses PLTL properties.
Huang et al. [102] present ROSRV, which is an online monitoring framework that runs on top of ROS. They make use of a public-subscribe communication architecture and intercept commands and messages passing through the communications channel. This way, they are able to verify safety and security requirements at runtime using a domain-specific language.
Open-source
Simulation
Manhaes and Rauschenbach present UUV simulator [148], which is an extension of Gazebo accommodating the domain-specific aspects of underwater vehicles. They assist with modelling of underwater hydrostatic and hydrodynamic effects, thrusters, sensors, and external disturbances and demonstrate their tool on a case using a modified model of the Sperre SF 30k ROV robot (RexROV).
As another tool for underwater robots, MARS [212] provides simulation environments for marine swarm robots that allows for hardware-in-the-loop simulation. The tool has a Java interface and has been applied to the MONSUN and HANSE autonomous underwater robots.
In the Matlab environment, the FROST tool [100] is an open-source Matlab toolkit for modeling, trajectory optimisation, and simulation of robots, with a particular focus in dynamic locomotion. In the study, they model the ATLAS and DRC-HUBO as examples.
Munawar and Fischer [160] present the Asynchronous Framework, which incorporates real-time dynamic simulation and interfaces with learning agents to train and potentially allow for the execution of shared sub-tasks. Due to the asynchronous nature of the communication, they measure the number of packets against latency. Furthermore, they focus on surgical robots as part of their application domain, and they employ the CHAI3D haptics framework. They connect their tools with ROS, which allows them to connect to learning libraries such as TensorFlow.
D’Urso, Santoro, and Santoro [67] also present a simulator for multi-UAV applications, called GzUAVChannel. It works as a middleware that combines Gazebo, Autopilot, and NS-3 network simulator to provide a 3D visualisation engine, a physics simulator, a flight control stack, and a network simulator to handle communications among unmanned aerial vehicles. They model a leader-follower example.
The MoVE tool [54] provides the possibility of modelling pedestrian behaviour. The framework focuses on testing autonomous system algorithms, vehicles, and their interactions with real and simulated vehicles and pedestrians. They conduct three case studies: traffic wave observation, medical evacuation, and virtual vehicles avoiding real pedestrians.
Rohmer, Singh, and Freese introduce VREP [184] a popular robotics physics simulator that is now known as CoppeliaSIM. The tool uses a kinematics engine and several physics libraries to provide rigid body simulations (including meshes, joints, and multiple types of sensors).
Koolen et al. [117] implement robotic simulation library in the Julia programming language. The library offers support for robot dynamics, visualisation, and control algorithms.
Brambilla et al. have developed ARGOS [37], which is a multi-physics robot simulator that can simulate large-scale swarms and can be customised via plug-ins.
Cieslak et al. introduce Stonefish, a geometry-based simulator [52] that can be integrated with ROS. Last, the MARS [212] tool provides simulation environments for marine swarm robots.
Gambi, Mueller, and Fraser present the AsFault prototype tool [78]. The tool combines procedural content generation and search-based testing to automatically create challenging virtual scenarios for testing self-driving car software.
Garzón and Spalanzani [80] present a tool that combines 3D simulation (for ego-vehicle control) with a traffic simulator (which controls the behaviour of other vehicles). The goal is to test the ego-vehicle in realistic high-traffic situations.
Lugo-Cárdenas, Luzano, and Flores [139] introduce a 3D simulation tool for UAVs whose focus is on assisting the development of flight controllers.
Formal Verification
Parametric modelling of CAVs as a network of timed automata is used by Arcile et al. [16]. In this work, VerifCar tool is applied to assess the impact of communication delays on the decision algorithms of CAVs and to check the robustness and efficiency of such algorithms.
Gruber and Althoff [91] present a reachability analysis tool (Spot) that finds counter-example to property violations. It starts with a coarse model of the system dynamics but it can refine the abstraction levels for precision/scaling.
Desai et al. [63] present a runtime verification framework (DRONA) based on Signal Temporal Logic [144], where an online monitor checks robustness on partial trajectories from low-level controllers.
RoVer [175] provides visual authoring of HRI, formalisation of properties in temporal logic and verification (via model-checking with PRISM [123]) that the interactions abide by a set of social norms and task expectations whose goal is to identify social norms violation.
Althoff introduces IMPROV [6], a tool that is used to formally verify human-robot interaction for modular robots.
Gainer et al. [77] provide a tool (CRutoN) for translation to formal models from control rules of robots in a DSL and input those models into NuSMV for model checking. Their main emphasis is the verification of human-robot interaction.
Bao et al. [18] present a prototype tool for parametric statistical model checking that can cope with complex parametric Markov chains where state-of-the-art tools (such as PRISM) have timed out. They provide evidence of their tool efficiency by conducting an industrial case study.
The FLPolyVF tool [4] connects functional verification, sensor verification, diagnostics, and industry/regulatory communication of autonomous vehicles while checking the effects of using different (matrix-based) scenario abstraction levels.
Lomuscio et al. have developed the MCMAS model checker [136]. They used a logic (ATL-k) alternating temporal logic (epistemic), one of the few tools that uses CTL. They demonstrate their strategy in a couple of small-scale examples, where they compare their strategy against alternatives (NuSMV and MCTK).
Runtime Monitoring
Huang et al. present ROSRV [102], which is a runtime verification framework that can be used with ROS.
Desai et al. [63] present a runtime verification framework based on Signal Temporal Logic, where an online monitor checks robustness on partial trajectories from low-level controllers (in the context of surgical robots).
Public
Simulation
Cavalcanti et al. [46, 47, 48, 156] introduce RoboTool, supporting graphical modelling, validation, and model checking (via FDR [85]) of robotic models written in RoboChart [156] and RoboSim [48].
Shah et al. [198] introduce the AirSim simulator that generates training data for building machine learning models used in autonomous aircraft. It offers physical and visual simulation, including models of physics engine, vehicle, environment, and sensors. Further, it connects to an API for planning and control.
Zhang et al. [233] introduce CyberEarth, a framework for program-driven simulation, visualisation and monitoring of robots. The tool integrates modules from several other open-source tool such as ROS [177] and OpenSceneGraph (OSG) [43].
Model-based Testing
Mullins et al. [159] developed a tool (RAPT - Range Adversarial Planning Tool) for generating test scenarios to be employed on the System Under Test. The tool employs an adaptive search method that generates challenging scenarios based on the performance and results of the previous ones. A clustering algorithm ranks the scenarios based on the performance type and how close they are to the boundaries of each cluster. The boundaries are based on notions of efficiency (precision and convergence), diversity (how many performance boundaries are being covered), and scaling.
Formal Verification
The only tool in this category is RoboTool [48], which has also been described above in the Simulation category. It provides formal verification via translated CSP models fed into the FDR model checker [85].
Private
Simulation
Heitmeyer and Leonard [98] introduce two tools integrated into the FORMAL framework; the tools synthesise and validate formal models. The first tool synthesises a formal Software Cost Reduction (SCR) requirements model from scenarios, and the second tool combines the existing SCR simulator [96] with eBotworks 3D simulator to allow for simulation of continuous components. They focus on the verification of human-machine interaction.
Verma et al. [215] present a Flight Software simulator (SSIM - part of Rover Sequencing and Visualisation Program (RSVP) suite) that is used to simulate MARS Rover missions. The simulator assists in predicting the behaviour of semi-autonomous systems by providing the capability for human operators to check if their intent is correctly captured by the robot prior to execution in different scenarios and environments.
Zhang et al. present RoadView [232], a photo-realistic simulator that tests performance of autonomous vehicles and evaluates their self-driving tasks. They make use of driving scenarios where they compare autonomous vehicle to a human-driven scenario to demonstrate their tool.
Schöner presents a simulation tool that is part of the (industrial) Pegasus framework [195]. It integrates sensors, traffic, and road models (in open-drive Format) into the simulation where different scenarios and situations are executed.
Model-based Testing
Collet et al. [53] introduce RobTest, a tool for generating collision-free trajectories for stress-testing of single-arm robots. It employs constraint programming techniques to solve continuous domain constraints in its trajectory generation process. The efficiency of such a process is evaluated in a controlled experiment where the generation time of acceptable near-optimal trajectories.
Formal Verification
O’Kelly introduces APEX [165], which is a formal verification tool for verifying vehicle dynamics, trajectory planning, and tracking stacks of ADAS in vehicles. Property specifications are written in metric interval temporal logic. The tool calls DReach [116] in the background to perform reachability analysis on the vehicle trajectories.
Foughali et al. [75] implement an automatic translation from GenoM [145], a robotics model-based software engineering framework, to the formal specification language Fiacre [28], which can be fed into TINA for model checking (on the Petri net models). They apply their tool to an autonomous ground vehicle (RMP 400 Segway).
Small
Simulation
Regarding small-scale case studies for simulation environments, the vast majority conduct a simple demonstration to illustrate features of their simulation tools [11, 42, 48, 50, 56, 95, 139, 232, 233], such as Mars [212] (for underwater robots) and VREP [184] (for generic robots) case studies.
Differently, Li et al. [131] employ a game-theoretical approach where vehicles have different levels of knowledge about other vehicles. For instance, a level 0 car has no knowledge about the other cars and a level k car has information about level k - 1 cars. Strangely, in their case study, they show that, in some instances, lower-level cars effect less constraint violations.
MBT
Two of the case studies in this category consider generating test cases from formal models [132, 163] of autonomous agents. Furthermore, Andrews et al. [10] model autonomous systems and their environment using Petri nets to generate test cases and apply their technique to a case study in the human-robot interaction domain. In Hagerman’s case study [93], finite state machines are used to extract security test suites. Sagglietti et al. [190, 191] conduct a case study in which the reconfiguration behaviour of autonomous agents is verified. Betts et al. [30] compare the effectiveness of two search-based testing methods, with a case study involving a UAV flight control software.
Formal Verification
Several of the included case studies in this category concern abstract representations of multi-agent autonomous systems and provided efficient algorithms for parametric (formal) verification or state-space reduction techniques [18, 22, 24, 119, 120, 122, 135, 136].
Several other case studies [19, 46, 47, 48, 137, 204] concern systems used in model checking tools such as Prism [123] and FDR [85]. Another use of these case studies is to demonstrate usage of introduced tools such as APEX [165] and MDE [147]. Differently, Dennis et al. [60, 62] focus on formalising and verifying ethical concerns in BDI agents and provide corresponding small case studies. Aminof et al. [9] investigate the decidability problem for parameterised grid sizes. In their case study, they found that restricting the grid size results in the problem being solved in Pspace.
Runtime monitoring
In the only paper in this category, Desai, Tomasso, and Seshia [63] make use of an STL-based (signal temporal logic) online monitoring system to ensure that the assumptions about the low-level controllers (discrete models) used during model checking hold at runtime. They demonstrate the strategy in a surveillance application case study.
Industrial
Simulation
Zhou et al. [237] introduce a framework for virtual testing of advanced driver assistant systems that uses real-world measurements. Shah et al. [198] build a model of quadrotor with pixhawk controller in their newly developed simulator, AirSim, that includes a physics engine and and supports real-time hardware-in-the-loop. Schöner presents a simulation tool that is part of the (industrial) Pegasus framework [195]. It integrates sensors, traffic, and road models (in open-drive Format) into the simulation where different scenarios and situations are executed. Reference [185] demonstrates their validation framework for autonomous systems in a farming context with simulations and field testing.
Uriagereka et al. [214] conduct simulation-assisted fault injection to assess safety and reliability of robotic systems. The feasibility of their method is demonstrated by applying it to the design of a real-time cartesian impedance control system. Manhaes and Rauschenbach [148] model the Sperre SF 30k ROV underwater robot (RexROV) in the demonstration of the simulator for unmanned underwater vehicles. Verma et al. [215] present a Flight Software simulator that is used to simulate MARS Rover missions. They demonstrate their approach with a corresponding case study. AbdElSalam et al. [1] use Hardware Emulation-in-the-loop to verify Electronic Control Units (ECUs) for ADAS systems.
MBT
In the only industrial case study in this category, Abdessalem et al. [25] generate test cases for a system that can visually detect pedestrians in the context of advanced driver assistance systems (ADAS).
Formal Verification
Gainer et al. [77] conduct a case study in the context of verification of human-robot interaction using the Care-O-Bot robotic assistant. Bhattacharyya et al. [31] apply formal verification methods to an autonomous CoPilot agent.
Runtime monitoring
Gladisch et al. [87] select case studies that use industrial automated driving (adaptive cruise control, lane keeping, and steering control scenarios) to evaluate their search-based testing strategy.
Benchmarks
Simulation
Several benchmark-scale case studies can be found in this category. In what follows, we briefly discuss some of them. Wigand et al. [224] have developed CoSiMA, which is an architecture for simulation, execution, and analysis of robotics system. They conduct experiments on the humanoid robot COMAN. Tosum et al. [213] present a design framework that facilitates the rapid creation of configurations and behaviours for modular robots. They have demonstrated their framework on the SMORES robot. Pereira et al. [172] employ several small case studies in their attempt to couple two simulators, namely, SUMO and USAR-Sim. Brambilla et al. [37] model a probabilistic swarm that is checked in PRISM to evaluate their property-driven design method. Bohlmann, Klinger, and Szczerbicka [36] automatically generate a model of a parallel delta robot on-the-fly. Their method for model generation is based on machine learning and symbiotic simulation techniques. Mutter et al. [161] also explore the simulation of UAV models in Simulink and discuss the results when combining the platform and environment models. Bi et al. [32] present a deep Learning-based framework for traffic simulation and execute several scenarios of intersections with and without pedestrians. D’Urso, Santoro, and Santoro [67] simulate leader-follower UAV scenarios in their framework. Their goal is to combine four simulation environments: a 3D visualisation engine, a physics simulator, a flight control stack, and a network simulator. Wang and Cheng [219] present a hardware-in-the-loop simulator for drones that can generate synthetic images from the scene as datasets, detect and verify objects with a trained neural network, and generate point cloud data for model validation. They simulate and conduct filed testing on a physical UAV. Heitmeyer et al. [98] synthesise software cost-reduction models of multiple autonomous systems to be used in a simulator integrated with the eBotworks simulation tool.
MBT
Proetzsch et al. [176] use a designed DSL (graph-based models) to describe the system behaviour of the autonomous off-road robot RAVON. The model is used as test model for generating test cases. Mullins et al. [159] have developed a tool (RAPT - Range Adversarial Planning Tool) for generating test scenarios to be employed on the System Under Test. Their tool is applied to realistic underwater missions. Furthermore, in their case study, Araiza-Illan, Pipe, and Eder [14] use BDI models and model checking of probabilistic timed automata (in UPPAAL) to generate test sequences for human-robot collaboration tasks.
Formal Verification
Here, we briefly discuss some of the benchmarks that involve formal verification. Halder et al. [94] use the physical robot Kobuki as a case study, over which properties are automatically verified using the UPPAAL model checker. The focus of their approach is to model and verify ROS systems using real-time properties. Brambilla et al. [37] model a probabilistic swarm that is checked in PRISM to evaluate their property-driven design method. Bicevskis, Gaujens, and Kalnins [33] develop models for the testing of UAV and UGV collaboration in the Simulink environment. Althoff et al. [6] propose a framework (IMPROV) for self-programming and self-verification for robots, which is demonstrated on a physical robotic arm. Zhao et al. [234] model an unmanned aerial vehicle (UAV) inspection mission on a wind farm and, via probabilistic model checking in PRISM, show how the battery features may affect verification results. Gruber and Althoff [91] present a reachability analysis tool (Spot) that finds counter-example to property violations. Their tool is evaluated using the CommonRoad benchmark PM1:MW1:DEU_Muc-3_1_T-1.
Runtime monitoring
Pasareanu, Gopinath, and Yu [170] present a compositional approach for the verification of autonomous systems and apply the technique on a neural network implementation of a controller for a collision avoidance system on the ACAS Xu unmanned aircraft. Temporal properties are employed in Wang’s case study [220], where the RoboticSpec specification language for robotic applications is translated into a framework for online monitoring that also uses PLTL properties. Huang et al. conduct a case study using a model of the LandShark UGV to demonstrate their tool, ROSRV [102], which is a runtime verification framework that can be used with ROS. In the context of obstacle avoidance, Luo et al. [140] employ JavaMOP in their case study to verify that the robot does not behave against requirements written in FSM and PTLT languages.
RQ1
By far, quantitative testing techniques are the most widely researched strategies (this was also a common observation for the domain and connectivity aspects).
RQ2
Among the measures used for evaluating interventions, efficiency is most often used, with effectiveness being a close second. Few interventions, however, were evaluated using a notion of coverage [14, 33, 93, 132, 143, 190, 191]. It is notable that, for runtime monitoring, only two publications [87, 102] employ an efficiency metric.
RQ3
There is a considerable lack of tools for model-based testing and runtime monitoring. For simulation and formal verification, there seems to be some considerable strength in terms of tool support.
RQ4
Approximately 54 \(\%\) of the interventions used small-scale case studies for their evaluations, while only 10 \(\%\) evaluated their strategy in an industrial context, indicating a clear gap.

6.2 For Researchers

Throughout the various categories we have coded in this case study, the most prominent gap is in the use of agreed-upon rigorous measures to evaluate the efficiency and effectiveness of the interventions as well as real-world benchmarks that can be used to evaluate such measures. As observed in the earlier sections, much of the measures of efficiency and effectiveness measures are very generic, and there is also a relative gap in domain-specific measures suitable for the RAS sub-domains. A lack of domain-specific modelling languages and the limited number of runtime verification approaches indicate that there is room for improvement in RAS testing strategies.
Another considerable gap is in the use of quantitative specification languages to specify the desired properties of the system; due to the inherent heterogeneity of RAS, we need to have property languages that cover aspects such as the combination of discrete and continuous dynamics as well as stochastic and epistemic aspects that may be used to model the aspects of behaviour concerning the environment and the users. Connected to this point is the relative gap in interventions that perform a quantitative analysis of the system and provide quantitative metrics of quality as the outcome of the test. Some starting points in this direction are the use of quantitative properties that incorporate probabilistic and stochastic- [37, 171], timed- [94, 165, 201], and continuous dynamical [63] aspects of RAS. We have also noted the use of a specification language that caters for a combination of stochastic and continuous aspect of RAS [137]. On the contrary, there is a relative strength in using qualitative models, including property specification languages as predicate- [8] and temporal logics [19, 41, 75, 103, 107, 140, 170, 203], as well as epistemic extensions thereof [121, 134]. Also, there is a wealth of studies on the use of discrete relational [149], state-based [37, 47, 93, 140, 156, 228, 230], and belief-based [13, 14, 107, 163] abstract models in testing and verification of RAS. Also, several studies used informal simulation models for simulation tools such as Gazebo and USARSim [13, 14, 42, 42, 51, 102, 127, 172]. A suitable middle-ground may be the semi-formal and domain-specific models such as those built in Matlab/Simulink [25, 30, 161].
Regarding techniques, most of the techniques used so far in the literature have been formal verification techniques applied on (relatively high-level) qualitative [8, 41, 71, 121, 134, 147, 187, 228, 230] or quantitative [12, 19, 33, 37, 47, 63, 74, 75, 94, 103, 107, 110, 137, 156, 165, 170, 171, 189, 201, 218, 225] models of RAS. There is also some strength in the use of informal simulation techniques [37, 42, 126, 130, 131, 139, 156, 161, 172, 192, 232]. We have seen relatively few model-based testing [10, 14, 25, 30, 93, 132, 163, 176, 191] and runtime verification [63, 102, 140, 170] techniques that have been applied to (models of) complex and detailed RAS. We hence see a gap, and a trend towards closing this gap, in dynamic and non-exhaustive testing of RAS techniques.
Finally, lack of public tooling is a major gap observed in the literature. There are very few techniques that are accompanied by a tool, and there are very few public tools for testing RAS [47, 71, 75, 102, 121, 139, 156, 218, 232].

6.3 For Practitioners

The most significant gap is lack of industrial evaluation of existing interventions. There have been very few interventions applied in an industrial context and to systems of industrial complexity [1, 2, 25, 31, 70, 77, 87, 105, 113, 148, 153, 185, 185, 194, 195, 197, 198, 208, 214, 215].
Unfortunately, the number of interventions is too small to conclude any meaningful trend and indication of strong evidence for applicability in the industrial setting. Among the proposed interventions, most either concerned simulation-based testing [148, 215] or connected the results of their verification to some simulation tool (mostly based on ROS-Gazebo integration) [1, 70, 153]. Search-based testing [2, 25, 87] and interaction testing [2, 208] are two notable techniques that have been used in industrial contexts. Among the models employed in the industrial context, variants of state machines [215] and fault trees [214] can be mentioned. A notable study in this regard [197] is a comparison of supervisory-control, deductive- and inductive (model-checking) verification techniques in the industrial context.
The human- and information-source is another aspect of testing interventions that is a severely understudied. We note a recent trend in combining user studies (in the sense of human-computer- and human-robot interactions) and traditional testing, validation, and verification techniques [6, 150, 188].
Also, there is a gap in defining and evaluating testing processes, particularly in industrial contexts.
The lack of industrial- and domain-expert input into the models and techniques is evident and has led to generic and relatively simple modelling techniques and property languages being used for most intervention. Co-production with industrial partners can enrich these aspects and lead to models that can deal with the heterogeneity and complexity of industrial RAS.

7 Conclusion

We performed a systematic review of the interventions for testing robotics and autonomous systems to answer the following research questions:
(1)
What are the types of models used for testing RAS?
(2)
Which efficiency and effectiveness measures were introduced or used to evaluate RAS testing interventions?
(3)
What are the interventions supported by (publicly available) tools in this domain?
(4)
Which interventions have evidence of applicability to large-scale and industrial systems?
To this end, we started off by performing a pilot study on a seed of 26 papers. Using this pilot study, we designed and validated a search query, designed rigorous inclusion and exclusion criteria and developed an adaptation of the SERP-Test taxonomy. Subsequently, we went through two phases of search, validation and coding, in total going through 10,534 papers. We finally coded the set of 192 included papers and analysed them to answer our research questions.
A summary of the findings of the review with regards to our research questions is provided below:
(1)
There is a wealth of formal and informal models used for testing RAS. In particular, there is a sizeable literature on using generic property specification languages (such as linear temporal logic) and qualitative modelling languages, such as variants of state machines, UML diagrams, Petri nets, and process algebras. There is a clear gap in quantitative modelling languages that can capture the complex and heterogeneous nature of RAS. There is also a lack of domain-specific languages that can capture domain knowledge for various sub-domains of RAS.
(2)
We observed a gap in rigorous and widely accepted metrics to measure effectiveness and efficiency, and adequacy, of testing interventions. Similar to the previous items, those measures used in the literature are very generic and do not pertain to the domain-specific aspects of RAS. Hence, there is a gap and a research opportunity for defining and evaluating rigorous (domain-specific) measures for efficiency, effectiveness, and adequacy for RAS testing interventions.
(3)
There are a considerable number of interventions that rely on public tools to implement or evaluate their interventions. However, there are very few that make their proposed/evaluated interventions available for public use in terms of publicly available tools. There is hence a considerable gap in providing datasets and public tools for further development of the field.
(4)
There are less than a handful of testing interventions that have been evaluated in an industrial context. There have been some other interventions that used some real robots or autonomous systems, but in an academic context. This signifies the importance of future co-production between academia and industry in industrial evaluation of testing interventions for RAS.

Acknowledgments

We would like to thank Jan Tretmans and Wojciech Mostowski for comments and discussions at the early stage of this research. Moreover, we would like to thank Thomas Arts, Michael Fisher, Mario Gleirscher, Robert Hierons, Fabio Palomba, and, Kristin Rozier for their comments at the validation stage of this study.

Footnotes

References

[1]
Mohamed AbdElSalam, Keroles Khalil, John Stickley, Ashraf Salem, and Bruno Loye. 2019. Verification of advanced driver assistance systems (ADAS) and autonomous vehicles with hardware emulation-in-the-loop. Int. J. Automot. Eng. 10, 2 (2019), 197–204.
[2]
Raja Ben Abdessalem, Annibale Panichella, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018. Testing autonomous cars for feature interaction failures using many-objective search. In 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 143–154.
[3]
Nauman Bin Ali, Emelie Engström, Masoumeh Taromirad, Mohammad Reza Mousavi, Nasir Mehmood Minhas, Daniel Helgesson, Sebastian Kunze, and Mahsa Varshosaz. 2019. On the search for industry-relevant regression testing research. Empir. Softw. Eng. 24, 4 (2019), 2020–2055. DOI:
[4]
Ala Jamil Alnaser, Mustafa Ilhan Akbas, Arman Sargolzaei, and Rahul Razdan. 2019. Autonomous vehicles scenario testing framework and model of computation. SAE Int. J. Connect. Automat. Vehic. 2, 4 (2019), 60617–60628.
[5]
Matthias Althoff and John M. Dolan. 2011. Set-based computation of vehicle behaviors for the online verification of autonomous vehicles. In 14th International IEEE Conference on Intelligent Transportation Systems (ITSC). IEEE, Washington, DC, 1162–1167.
[6]
Matthias Althoff, Andrea Giusti, Stefan B. Liu, and Aaron Pereira. 2019. Effortless creation of safe robots from modules through self-programming and self-verification. Sci. Robot. 4, 31 (2019), 56–89.
[7]
Saifullah Amin, Adnan Elahi, Kashif Saghar, and Faran Mehmood. 2017. Formal modelling and verification approach for improving probabilistic behaviour of robot swarms. In 14th International Bhurban Conference on Applied Sciences and Technology (IBCAST). IEEE, New York, NY, 392–400.
[8]
Benjamin Aminof, Aniello Murano, Sasha Rubin, and Florian Zuleger. 2015. Verification of asynchronous mobile-robots in partially-known environments. In International Conference on Principles and Practice of Multi-Agent Systems. Springer, 185–200.
[9]
Benjamin Aminof, Aniello Murano, Sasha Rubin, and Florian Zuleger. 2016. Automatic verification of multi-agent systems in parameterised grid-environments. In International Conference on Autonomous Agents & Multiagent Systems. ACM, 1190–1199.
[10]
Anneliese Andrews, Mahmoud Abdelgawad, and Ahmed Gario. 2016. World model for testing urban search and rescue (USAR) robots using petri nets. In 4th International Conference on Model-Driven Engineering and Software Development (MODELSWARD). IEEE, 663–670.
[11]
Vimal Rau Aparow, Apratim Choudary, Giridharan Kulandaivelu, Thomas Webster, Justin Dauwels, and Niels de Boer. 2019. A comprehensive simulation platform for testing autonomous vehicles in 3D virtual environment. In IEEE 5th International Conference on Mechatronics System and Robots (ICMSR). IEEE, 115–119.
[12]
Ryota Arai and H. Schlingloff. 2017. Model-based performance prediction by statistical model checking an industrial case study of autonomous transport robots. In Concurrency, Specification and Programming Conference.CEUR.
[13]
Dejanira Araiza-Illan, Anthony G. Pipe, and Kerstin Eder. 2016. Intelligent agent-based stimulation for testing robotic software in human-robot interactions. In 3rd Workshop on Model-driven Robot Software Engineering. ACM, 9–16.
[14]
Dejanira Araiza-Illan, Tony Pipe, and Kerstin Eder. 2016. Model-based testing, using belief-desire-intentions agents, of control code for robots in collaborative human-robot interactions. arXiv preprint arXiv:1603.00656 1 (2016).
[15]
Rafael Araújo, Alexandre Mota, and Sidney Nogueira. 2017. Analyzing cleaning robots using probabilistic model checking. In International Conference on Information Reuse and Integration. Springer, 23–51.
[16]
Johan Arcile, Raymond Devillers, and Hanna Klaudel. 2019. VerifCar: A framework for modeling and model checking communicating autonomous vehicles. Auton. Agents Multi-agent syst. 33, 3 (2019), 353–381.
[17]
Radu F. Babiceanu and Remzi Seker. 2017. Formal verification of trustworthiness requirements for small unmanned aerial systems. In Integrated Communications, Navigation and Surveillance Conference (ICNS). IEEE, 6A3–1.
[18]
Ran Bao, Christian Attiogbe, Benoit Delahaye, Paulin Fournier, and Didier Lime. 2019. Parametric statistical model checking of UAV flight plan. In International Conference on Formal Techniques for Distributed Objects, Components, and Systems. Springer, 57–74.
[19]
Benoît Barbot, Béatrice Bérard, Yann Duplouy, and Serge Haddad. 2017. Statistical model-checking for autonomous vehicle safety validation. In Conference SIA Simulation Numérique. HAL-Inria.
[20]
Halil Beglerovic, Steffen Metzner, and Martin Horn. 2018. Challenges for the Validation and Testing of Automated Driving Functions. (Jan.2018). DOI:
[21]
Michael Behrisch, Laura Bieker, Jakob Erdmann, and Daniel Krajzewicz. 2011. SUMO-Simulation of urban mobility: An overview. In 3rd International Conference on Advances in System Simulation. ThinkMind.
[22]
Francesco Belardinelli, Panagiotis Kouvaros, and Alessio Lomuscio. 2017. Parameterised verification of data-aware multi-agent systems. In International Joint Conference on Artificial Intelligence. ACM, 98–104.
[23]
Francesco Belardinelli, Alessio Lomuscio, Aniello Murano, and Sasha Rubin. 2017. Verification of broadcasting multi-agent systems against an epistemic strategy logic. In International Joint Conference on Artificial Intelligence. ACM, Melbourne, Australia, 91–97.
[24]
Francesco Belardinelli, Alessio Lomuscio, Aniello Murano, and Sasha Rubin. 2017. Verification of multi-agent systems with imperfect information and public actions. In International Conference on Autonomous Agents and Multiagent Systems. ACM, 1268–1276.
[25]
Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2016. Testing advanced driver assistance systems using multi-objective search and neural networks. In 31st IEEE/ACM International Conference on Automated Software Engineering. IEEE/ACM, New York, NY, 63–74.
[26]
Luca Benvenuti, Davide Bresolin, Pieter Collins, Alberto Ferrari, Luca Geretti, and Tiziano Villa. 2014. Assume–guarantee verification of nonlinear hybrid systems with Ariadne. Int. J. Robust Nonlin. Contr. 24, 4 (2014), 699–724.
[27]
Christian Berger. 2015. Accelerating regression testing for scaled self-driving cars with lightweight virtualization-A case study. In IEEE/ACM 1st International Workshop on Software Engineering for Smart Cyber-physical Systems. IEEE, 2–7.
[28]
Bernard Berthomieu, Jean-Paul Bodeveix, Patrick Farail, Mamoun Filali, Hubert Garavel, Pierre Gaufillet, Frederic Lang, and François Vernadat. 2008. Fiacre: An intermediate language for model verification in the topcased environment. In 4th European Congress ERTS Embedded Real Time Software (ERTS’08). SEE.
[29]
Bernard Berthomieu, P.-O. Ribet, and François Vernadat. 2004. The tool TINA–construction of abstract state spaces for Petri nets and time Petri nets. Int. J. Product. Res. 42, 14 (2004), 2741–2756.
[30]
Kevin M. Betts and Mikel D. Petty. 2016. Automated search-based robustness testing for autonomous vehicle software. Model. Simul. Eng. 2016 (2016).
[31]
Siddhartha Bhattacharyya, Thomas C. Eskridge, Natasha A. Neogi, Marco Carvalho, and Milton Stafford. 2018. Formal assurance for cooperative intelligent autonomous agents. In NASA Formal Methods Symposium. Springer, 20–36.
[32]
Huikun Bi, Tianlu Mao, Zhaoqi Wang, and Zhigang Deng. 2019. A deep learning-based framework for intersectional traffic simulation and editing. IEEE Trans. Visualiz. Comput. Graph. 1 (2019).
[33]
Janis Bicevskis, Artis Gaujens, and Janis Kalnins. 2013. Testing of RUAV and UGV robots’ collaboration in the Simulink environment. Balt. J. Mod. Comput. 1 (2013).
[34]
Andreas Bihlmaier and Heinz Wörn. 2014. Robot unit testing. In International Conference on Simulation, Modeling, and Programming for Autonomous Robots. Springer, 255–266.
[35]
Eckard Böde, Matthias Büker, Ulrich Eberle, Martin Fränzle, Sebastian Gerwinn, and Birte Kramer. 2018. Efficient splitting of test and simulation cases for the verification of highly automated driving functions. In International Conference on Computer Safety, Reliability, and Security. Springer, 139–153.
[36]
Sebastian Bohlmann, Volkhard Klinger, and Helena Szczerbicka. 2017. Integration of a physical system, machine learning, simulation, validation and control systems towards symbiotic model engineering. In Symposium on Modeling and Simulation of Complexity in Intelligent, Adaptive and Autonomous Systems. ACM, 1–12.
[37]
Manuele Brambilla, Arne Brutschy, Marco Dorigo, and Mauro Birattari. 2014. Property-driven design for robot swarms: A design method based on prescriptive modeling and model checking. ACM Trans. Auton. Adapt. Syst. 9, 4 (2014), 1–28.
[38]
Paul Bremner, Louise A. Dennis, Michael Fisher, and Alan F. Winfield. 2019. On proactive, transparent, and verifiable ethical reasoning for robots. Proc. IEEE 107, 3 (2019), 541–561.
[39]
Davide Bresolin, Luca Geretti, Riccardo Muradore, Paolo Fiorini, and Tiziano Villa. 2015. Formal verification applied to robotic surgery. In Coordination Control of Distributed Systems. Springer, 347–355.
[40]
Davide Bresolin, Luca Geretti, Riccardo Muradore, Paolo Fiorini, and Tiziano Villa. 2015. Formal verification of robotic surgery tasks by reachability analysis. Microproc. Microsyst. 39, 8 (2015), 836–842.
[41]
Julien Brunel and Jacques Cazin. 2012. Formal verification of a safety argumentation and application to a complex UAV system. In International Conference on Computer Safety, Reliability, and Security. Springer, 307–318.
[42]
Qing Bu, Fuhua Wan, Zhen Xie, Qinhu Ren, Jianhua Zhang, and Sheng Liu. 2015. General simulation platform for vision based UAV testing. In IEEE International Conference on Information and Automation. IEEE, 2512–2516.
[43]
Don Burns and Robert Osfield. 2004. Tutorial: Open scene graph A: Introduction tutorial: Open scene graph B: Examples and applications. In IEEE Virtual Reality Conference. IEEE, 265–265.
[44]
IPG CarMaker. 2014. Users guide version 4.5. 2. IPG Automot., Karlsr., Germ. 1 (2014).
[45]
Stefano Carpin, Mike Lewis, Jijun Wang, Stephen Balakirsky, and Chris Scrapper. 2007. USARSim: A robot simulator for research and education. In Proceedings IEEE International Conference on Robotics and Automation. IEEE, 1400–1405.
[46]
Ana Cavalcanti, James Baxter, Robert M. Hierons, and Raluca Lefticaru. 2019. Testing robots using CSP. In International Conference on Tests and Proofs. Springer, 21–38.
[47]
Ana Cavalcanti, Alvaro Miyazawa, Augusto Sampaio, Wei Li, Pedro Ribeiro, and Jon Timmis. 2018. Modelling and verification for swarm robotics. In International Conference on Integrated Formal Methods. Springer, 1–19.
[48]
Ana Cavalcanti, Augusto Sampaio, Alvaro Miyazawa, Pedro Ribeiro, Madiel Conserva Filho, André Didier, Wei Li, and Jon Timmis. 2019. Verified simulation for robotics. Sci. Comput. Program. 174 (2019), 1–37.
[49]
Qianwen Chao, Xiaogang Jin, Hen-Wei Huang, Shaohui Foong, Lap-Fai Yu, and Sai-Kit Yeung. 2019. Force-based heterogeneous traffic simulation for autonomous vehicle testing. In International Conference on Robotics and Automation (ICRA). IEEE, 8298–8304.
[50]
Shitao Chen, Yu Chen, Songyi Zhang, and Nanning Zheng. 2019. A novel integrated simulation and testing platform for self-driving cars with hardware in the loop. IEEE Trans. Intell. Vehic. 4, 3 (2019), 425–436.
[51]
Yu Chen, Shitao Chen, Tangyike Zhang, Songyi Zhang, and Nanning Zheng. 2018. Autonomous vehicle testing and validation platform: Integrated simulation system with hardware in the loop. In IEEE Intelligent Vehicles Symposium. IEEE, 949–956.
[52]
Patryk Cieślak. 2019. Stonefish: An advanced open-source simulation tool designed for marine robotics, with a ROS interface. In OCEANS Conference. IEEE, 1–6.
[53]
Mathieu Collet, Arnaud Gotlieb, Nadjib Lazaar, and Morten Mossige. 2019. Stress testing of single-arm robots through constraint-based generation of continuous trajectories. In IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE, 121–128.
[54]
Marc Compere, Garrett Holden, Otto Legon, and Roberto Martinez Cruz. 2019. MoVE: A mobility virtual environment for autonomous vehicle testing. In ASME International Mechanical Engineering Congress and Exposition. American Society of Mechanical Engineers.
[55]
A. Cortesi, P. Ferrara, and N. Chaki. 2013. Static analysis techniques for robotics software verification. In IEEE 44th International Symposium on Robotics. IEEE, 1–6.
[56]
Piotr Cybulski. 2019. A framework for autonomous UAV swarm behavior simulation. In Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 471–478.
[57]
Werner Damm and Roland Galbas. 2018. Exploiting learning and scenario-based specification languages for the verification and validation of highly automated driving. In IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS). IEEE, 39–46.
[58]
Vânia de Oliveira Neves, Márcio Eduardo Delamaro, and Paulo Cesar Masiero. 2019. Automated structural software testing of autonomous vehicles. In 20th Conferencia Iberoamericana en Software Engineering. CIbSE.
[59]
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and Tamt Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evolut. Computat. 6, 2 (2002), 182–197.
[60]
Louise Dennis, Michael Fisher, Marija Slavkovik, and Matt Webster. 2016. Formal verification of ethical choices in autonomous systems. Robot. Auton. Syst. 77 (2016), 1–14.
[61]
Louise A. Dennis. 2018. The MCAPL framework including the agent infrastructure layer and agent Java Pathfinder. J. Open Source Softw. 3, 24 (2018), 617. DOI:
[62]
Louise A. Dennis, Michael Fisher, and Alan F. T. Winfield. 2015. Towards verifiably ethical robot behaviour. In Workshops at the 29th AAAI Conference On Artificial Intelligence (AAAI’15). IEEE.
[63]
Ankush Desai, Tommaso Dreossi, and Sanjit A. Seshia. 2017. Combining model checking and runtime verification for safe robotics. In Runtime Verification, Shuvendu Lahiri and Giles Reger (Eds.). Springer International Publishing, Cham, 172–189.
[64]
Ha Thi Thu Doan, François Bonnet, and Kazuhiro Ogata. 2018. Model checking of robot gathering. In 21st International Conference on Principles of Distributed Systems (OPODIS’17). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[65]
Simulink Documentation. 2020. Simulation and Model-based Design.Retrieved from: https://www.mathworks.com/products/simulink.html.
[66]
Daniela Doroftei, Anibal Matos, Eduardo Silva, Victor Lobo, Rene Wagemans, and Geert De Cubber. 2015. Operational validation of robots for risky environments. In 8th IARP Workshop on Robotics for Risky Environments. IARP/EURON.
[67]
Fabio D’Urso, Corrado Santoro, and Federico Fausto Santoro. 2019. An integrated framework for the realistic simulation of multi-UAV applications. Comput. Electric. Eng. 74 (2019), 196–209.
[68]
Emelie Engström, Kai Petersen, Nauman bin Ali, and Elizabeth Bjarnason. 2017. SERP-test: A taxonomy for supporting industry–academia communication. Softw. Qual. J. 25 (2017), 1269–1305.
[69]
Giuseppe Airò Farulla and Anna-Lena Lamprecht. 2017. Model checking of security properties: A case study on human-robot interaction processes. In 12th International Conference on Design & Technology of Integrated Systems In Nanoscale Era (DTIS). IEEE, 1–6.
[70]
S. Alireza Fayazi, Ardalan Vahidi, and Andre Luckow. 2019. A vehicle-in-the-loop (VIL) verification of an all-autonomous intersection control scheme. Transport. Res. Part C: Emerg. Technol. 107 (2019), 193–210.
[71]
Lucas E. R. Fernandes, Vinicius Custodio, Gleifer V. Alves, and Michael Fisher. 2017. A rational agent controlling an autonomous vehicle: Implementation and formal verification. arXiv preprint arXiv:1709.02557 1 (2017).
[72]
Angelo Ferrando, Louise A. Dennis, Davide Ancona, Michael Fisher, and Viviana Mascardi. 2018. Verifying and validating autonomous systems: Towards an integrated approach. In International Conference on Runtime Verification. Springer, 263–281.
[73]
Mohammed Foughali. 2019. On reconciling schedulability analysis and model checking in robotics. In International Conference on Model and Data Engineering. Springer, 32–48.
[74]
Mohammed Foughali, Bernard Berthomieu, Silvano Dal Zilio, Pierre-Emmanuel Hladik, Félix Ingrand, and Anthony Mallet. 2018. Formal verification of complex robotic systems on resource-constrained platforms. In IEEE/ACM 6th International FME Workshop on Formal Methods in Software Engineering (FormaliSE). IEEE, 2–9.
[75]
Mohammed Foughali, Bernard Berthomieu, Silvano Dal Zilio, Félix Ingrand, and Anthony Mallet. 2016. Model checking real-time properties on the functional layer of autonomous robots. In International Conference on Formal Engineering Methods. Springer, 383–399.
[76]
National Science Foundation. 2018. Smart and Autonomous Systems (S&AS) Program Solicitation. Retrieved from https://www.nsf.gov/pubs/2018/nsf18557/nsf18557.htm.
[77]
Paul Gainer, Clare Dixon, Kerstin Dautenhahn, Michael Fisher, Ullrich Hustadt, Joe Saunders, and Matt Webster. 2017. CRutoN: Automatic verification of a robotic assistant’s behaviours. In International Workshop on Formal Methods and Automated Verification of Critical Systems. Springer, 119–133.
[78]
Alessio Gambi, Marc Mueller, and Gordon Fraser. 2019. Automatically testing self-driving cars with search-based procedural content generation. In 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 318–328.
[79]
Shenjian Gao and Yanwen Tan. 2017. Paving the Way for Self-driving Cars - Software Testing for Safety-critical Systems Based on Machine Learning: A Systematic Mapping Study and a Survey. Blekinge Tekniska Högsola.
[80]
Mario Garzón and Anne Spalanzani. 2018. An hybrid simulation tool for autonomous cars in very high traffic scenarios. In 15th International Conference on Control, Automation, Robotics and Vision (ICARCV). IEEE, 803–808.
[81]
Lydia Gauerhof, Peter Munk, and Simon Burton. 2018. Structuring validation targets of a machine learning function applied to automated driving. In International Conference on Computer Safety, Reliability, and Security. Springer, 45–58.
[82]
Luca Geretti, Riccardo Muradore, Davide Bresolin, Paolo Fiorini, and Tiziano Villa. 2017. Parametric formal verification: The robotic paint spraying case study. IFAC-PapersOnLine 50, 1 (2017), 9248–9253.
[83]
Achim Gerstenberg and Martin Steinert. 2019. Evaluating and optimizing chaotically behaving mobile robots with a deterministic simulation. Procedia CIRP 84 (2019), 219–224.
[84]
Carlo Ghezzi, Dino Mandrioli, and Angelo Morzenti. 1990. TRIO: A logic language for executable specifications of real-time systems. J. Syst. Softw. 12, 2 (1990), 107–123. DOI:
[85]
Thomas Gibson-Robinson, Philip Armstrong, Alexandre Boulgakov, and Andrew W. Roscoe. 2014. FDR3-A modern refinement checker for CSP. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 187–201.
[86]
Edmond Gjondrekaj, Michele Loreti, Rosario Pugliese, Francesco Tiezzi, Carlo Pinciroli, Manuele Brambilla, Mauro Birattari, and Marco Dorigo. 2012. Towards a formal verification methodology for collective robotic systems. In International Conference on Formal Engineering Methods. Springer, 54–70.
[87]
Christoph Gladisch, Thomas Heinz, Christian Heinzemann, Jens Oehlerking, Anne von Vietinghoff, and Tim Pfitzer. 2019. Experience paper: Search-based testing in automated driving control applications. In 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 26–37.
[88]
Mario Gleirscher, Simon Foster, and Yakoub Nemouchi. 2019. Evolution of formal model-based assurance cases for autonomous robots. In International Conference on Software Engineering and Formal Methods. Springer, 87–104.
[89]
Mario Gleirscher, Simon Foster, and Jim Woodcock. 2020. New opportunities for integrated formal methods. ACM Comput. Surv. 52, 6 (2020), 117:1–117:36. DOI:
[90]
João S. V. Gonçalves, João Jacob, Rosaldo J. F. Rossetti, António Coelho, and Rui Rodrigues. 2015. An integrated framework for mobile-based ADAS simulation. In Modeling Mobility with Open Data. Springer, Berlin, Germany, 171–186.
[91]
Felix Gruber and Matthias Althoff. 2018. Anytime safety verification of autonomous vehicles. In 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 1708–1714.
[92]
K. M. Gupta and K. Gillespie. 2015. eBotworks: A software platform for developing and evaluating communicative autonomous systems. AUVSI Unman. Syst., Atlanta, GA 1 (2015).
[93]
Seana Hagerman, Anneliese Andrews, and Stephen Oakes. 2016. Security testing of an unmanned aerial vehicle (UAV). In Cybersecurity Symposium (CYBERSEC). IEEE, 26–31.
[94]
Raju Halder, José Proença, Nuno Macedo, and André Santos. 2017. Formal verification of ROS-based robotic applications using timed-automata. In IEEE/ACM 5th International FME Workshop on Formal Methods in Software Engineering (FormaliSE). IEEE, 44–50.
[95]
Jani Erik Heikkinen, Salimzhan Gafurov, Sergey Kopylov, Tatiana Minav, Sergey Grebennikov, and Artur Kurbanov. 2019. Hardware-in-the-loop platform for testing autonomous vehicle control algorithms. In 12th International Conference on Developments in eSystems Engineering (DeSE). IEEE, 906–911.
[96]
Constance Heitmeyer, Myla Archer, Ramesh Bharadwaj, and Ralph Jeffords. 2005. Tools for Constructing Requirements Specification: The SCR Toolset at the Age of Ten. Technical Report. Naval Research Lab Washington DC Center for High Assurance Computing Systems.
[97]
Constance L. Heitmeyer. 2002. Software cost reduction. Encyc. Softw. Eng. 1 (2002).
[98]
Constance L. Heitmeyer and Elizabeth I. Leonard. 2015. Obtaining trust in autonomous systems: Tools for formal model synthesis and validation. In IEEE/ACM 3rd FME Workshop on Formal Methods in Software Engineering. IEEE, 54–60.
[99]
Philipp Helle, Wladimir Schamai, and Carsten Strobel. 2016. Testing of autonomous systems—Challenges and current state-of-the-art. INCOSE Int. Sympos. 26, 1 (2016), 571–584. DOI:
[100]
Ayonga Hereid and Aaron D. Ames. 2017. FROST*: Fast robot optimization and simulation toolkit. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 719–726.
[101]
Charles Antony Richard Hoare. 1978. Communicating sequential processes. Commun. ACM 21, 8 (1978), 666–677.
[102]
Jeff Huang, Cansu Erdogan, Yi Zhang, Brandon Moore, Qingzhou Luo, Aravind Sundaresan, and Grigore Rosu. 2014. ROSRV: Runtime verification for robots. In International Conference on Runtime Verification. Springer, 247–254.
[103]
Laura R. Humphrey. 2013. Model checking for verification in UAV cooperative control applications. Rec. Adv. Res. Unman. Aer. Vehic. 444 (2013), 69–117.
[104]
David Husch and John Albeck. 2004. Trafficware SYNCHRO 6 user guide. TrafficWare, Alb., Calif. 11 (2004).
[105]
Adam Jacoff, Hui-Min Huang, Elena Messina, Ann Virts, and Anthony Downs. 2010. Comprehensive standard test suites for the performance evaluation of mobile robots. In 10th Performance Metrics for Intelligent Systems Workshop. ACM, 161–168.
[106]
Andreas Junghanns, Jakob Mauss, Mugur Tatar, et al. 2008. TestWeaver—A tool for simulation-based test of mechatronic designs. In 6th International Modelica Conference. Citeseer.
[107]
Maryam Kamali, Louise A. Dennis, Owen McAree, Michael Fisher, and Sandor M. Veres. 2017. Formal verification of autonomous vehicle platooning. Sci. Comput. Program. 148 (2017), 88–106.
[108]
Y. Kang, H. Yin, and C. Berger. 2019. Test your self-driving algorithm: An overview of publicly available driving datasets and virtual testing environments. IEEE Trans. Intell. Vehic. 4, 2 (2019), 171–185.
[109]
Shinpei Kato, Shota Tokunaga, Yuya Maruyama, Seiya Maeda, Manato Hirabayashi, Yuki Kitsukawa, Abraham Monrroy, Tomohito Ando, Yusuke Fujii, and Takuya Azumi. 2018. Autoware on board: Enabling autonomous vehicles with embedded systems. In ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS). IEEE, 287–296.
[110]
Hojat Khosrowjerdi and Karl Meinke. 2018. Learning-based testing for autonomous systems using spatial and temporal requirements. In 1st International Workshop on Machine Learning and Software Engineering in Symbiosis. ACM, 6–15.
[111]
Baekgyu Kim, Yusuke Kashiba, Siyuan Dai, and Shinichi Shiraishi. 2016. Testing autonomous vehicle software in the virtual prototyping environment. IEEE Embed. Syst. Lett. 9, 1 (2016), 5–8.
[112]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1039–1049.
[113]
Florian Klück, Martin Zimmermann, Franz Wotawa, and Mihai Nica. 2019. Genetic algorithm-based test parameter optimization for ADAS system testing. In IEEE 19th International Conference on Software Quality, Reliability and Security (QRS). IEEE, 418–425.
[114]
A. Knauss, J. Schroder, C. Berger, and H. Eriksson. 2017. Software-related challenges of testing automated vehicles. In IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 328–330.
[115]
Nathan Koenig and Andrew Howard. 2004. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2149–2154.
[116]
Soonho Kong, Sicun Gao, Wei Chen, and Edmund Clarke. 2015. dReach: \(\delta\) -reachability analysis for hybrid systems. In International Conference on TOOLS and Algorithms for the Construction and Analysis of Systems. Springer, 200–205.
[117]
Twan Koolen and Robin Deits. 2019. Julia for robotics: Simulation and real-time control in a high-level programming language. In International Conference on Robotics and Automation (ICRA). IEEE, 604–611.
[118]
Philip Koopman and Michael Wagner. 2016. Challenges in autonomous vehicle testing and validation. SAE Int. J. Transport. Saf. 4 (042016), 15–24. DOI:
[119]
Panagiotis Kouvaros and Alessio Lomuscio. 2015. A counter abstraction technique for the verification of robot swarms. In AAAI Conference on Artificial Intelligence. AAAI PRESS.
[120]
Panagiotis Kouvaros and Alessio Lomuscio. 2015. Verifying emergent properties of swarms. In 24th International Joint Conference on Artificial Intelligence. AAAI Press.
[121]
Panagiotis Kouvaros and Alessio Lomuscio. 2016. Formal verification of opinion formation in swarms. In International Conference on Autonomous Agents & Multiagent Systems. ACM, 1200–1208.
[122]
Panagiotis Kouvaros, Alessio Lomuscio, Edoardo Pirovano, and Hashan Punchihewa. 2019. Formal verification of open multi-agent systems. In 18th International Conference on Autonomous Agents and MultiAgent Systems. ACM, 179–187.
[123]
Marta Kwiatkowska, Gethin Norman, and David Parker. 2011. PRISM 4.0: Verification of probabilistic real-time systems. In International Conference on Computer-aided Verification. Springer, 585–591.
[124]
John E. Laird. 2019. The Soar cognitive architecture. MIT Press.
[125]
Kim G. Larsen, Paul Pettersson, and Wang Yi. 1997. UPPAAL in a nutshell. Int. J. Software Tools Technol. Transf. 1, 1-2 (1997), 134–152.
[126]
Adrien Lasbouygues, Benoit Ropars, Robin Passama, David Andreu, and Lionel Lapierre. 2015. Atoms based control of mobile robots with hardware-in-the-loop validation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 1083–1090.
[127]
Jannik Laval, Luc Fabresse, and Noury Bouraqadi. 2013. A methodology for testing mobile autonomous robots. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 1842–1847.
[128]
Philippe Ledent, Anshul Paigwar, Alessandro Renzaglia, Radu Mateescu, and Christian Laugier. 2019. Formal validation of probabilistic collision risk estimation for autonomous driving. In IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM). IEEE, 433–438.
[129]
Li Li, Wu-Ling Huang, Yuehu Liu, Nan-Ning Zheng, and Fei-Yue Wang. 2016. Intelligence testing for autonomous vehicles: A new approach. IEEE Trans. Intell. Vehic. 1, 2 (2016), 158–166.
[130]
Nan Li, Dave Oyler, Mengxuan Zhang, Yildiray Yildiz, Anouck Girard, and Ilya Kolmanovsky. 2016. Hierarchical reasoning game theory based approach for evaluation and testing of autonomous vehicle control systems. In IEEE 55th Conference on Decision and Control (CDC). IEEE, 727–733.
[131]
Nan Li, Dave W. Oyler, Mengxuan Zhang, Yildiray Yildiz, Ilya Kolmanovsky, and Anouck R. Girard. 2017. Game theoretic modeling of driver and vehicle interactions for verification and validation of autonomous vehicle control systems. IEEE Trans. Contr. Syst. Technol. 26, 5 (2017), 1782–1797.
[132]
Raimar Lill and Francesca Saglietti. 2014. Testing the cooperation of autonomous robotic agents. In 9th International Conference on Software Engineering and Applications (ICSOFT-EA). IEEE, 287–296.
[133]
Mikael Lindvall, Adam Porter, Gudjon Magnusson, and Christoph Schulze. 2017. Metamorphic model-based testing of autonomous systems. In IEEE/ACM 2nd International Workshop on Metamorphic Testing (MET). IEEE, 35–41.
[134]
Alessio Lomuscio and Jakub Michaliszyn. 2015. Verifying multi-agent systems by model checking three-valued abstractions. In International Conference on Autonomous Agents and Multiagent Systems. ACM, 189–198.
[135]
Alessio Lomuscio and Edoardo Pirovano. 2019. A counter abstraction technique for the verification of probabilistic swarm systems. In International Conference on Autonomous Agents and Multiagent Systems. ACM, 161–169.
[136]
Alessio Lomuscio, Hongyang Qu, and Franco Raimondi. 2017. MCMAS: An open-source model checker for the verification of multi-agent systems. Int. J. Softw. Tools Technol. Transf. 19, 1 (2017), 9–30.
[137]
Yu Lu, Hanlin Niu, Al Savvaris, and Antonios Tsourdos. 2016. Verifying collision avoidance behaviours for unmanned surface vehicles using probabilistic model checking. IFAC-PapersOnLine 49, 23 (2016), 127–132.
[138]
Matt Luckcuck, Marie Farrell, Louise A. Dennis, Clare Dixon, and Michael Fisher. 2019. Formal specification and verification of autonomous robotic systems: A survey. ACM Comput. Surv. 52, 5 (2019), 100:1–100:41. DOI:
[139]
Israel Lugo-Cárdenas, Gerardo Flores, and Rogelio Lozano. 2014. The MAV3DSim: A simulation platform for research, education and validation of UAV controllers. IFAC Proc. 47, 3 (2014), 713–717.
[140]
Chenxia Luo, Rui Wang, Yu Jiang, Kang Yang, Yong Guan, Xiaojuan Li, and Zhiping Shi. 2018. Runtime verification of robots collision avoidance case study. In IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC). IEEE, 204–212.
[141]
Damian M. Lyons, Ronald C. Arkin, Shu Jiang, Dagan Harrington, Feng Tang, and Peng Tang. 2015. Probabilistic verification of multi-robot missions in uncertain environments. In IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 56–63.
[142]
Damian M. Lyons, Ronald C. Arkin, Shu Jiang, Matt O’Brien, Feng Tang, and Peng Tang. 2017. Performance verification for robot missions in uncertain environments. Robot. Auton. Syst. 98 (2017), 89–104.
[143]
István Majzik, Oszkár Semeráth, Csaba Hajdu, Kristóf Marussy, Zoltán Szatmári, Zoltán Micskei, András Vörös, Aren A. Babikian, and Dániel Varró. 2019. Towards system-level testing with coverage guarantees for autonomous vehicles. In ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS). IEEE, 89–94.
[144]
Oded Maler and Dejan Nickovic. 2004. Monitoring temporal properties of continuous signals. In Formal Techniques, Modelling and Analysis of Timed and Fault-tolerant Systems. Springer, 152–166.
[145]
Anthony Mallet, Cédric Pasteur, Matthieu Herrb, Séverin Lemaignan, and Félix Ingrand. 2010. GenoM3: Building middleware-independent robotic components. In IEEE International Conference on Robotics and Automation. IEEE, 4627–4632.
[146]
Michel Mamrot, Stefan Marchlewitz, Jan-Peter Nicklas, Petra Winzer, Thomas Tetzlaff, Philipp Kemper, and Ulf Witkowski. 2015. Model-based test and validation support for autonomous mechatronic systems. In IEEE International Conference on Systems, Man, and Cybernetics. IEEE, Hong Kong, 701–706.
[147]
Md Abdullah Al Mamun, Christian Berger, and Jorgen Hansson. 2013. MDE-based sensor management and verification for a self-driving miniature vehicle. In ACM Workshop on Domain-specific Modeling. ACM, 1–6.
[148]
Musa Morena Marcusso Manhães, Sebastian A. Scherer, Martin Voss, Luiz Ricardo Douat, and Thomas Rauschenbach. 2016. UUV simulator: A Gazebo-based package for underwater intervention and multi-robot simulation. In MTS/IEEE OCEANS Conference. IEEE, 1–8.
[149]
Niloofar Mansoor, Jonathan A. Saddler, Bruno Silva, Hamid Bagheri, Myra B. Cohen, and Shane Farritor. 2018. Modeling and testing a family of surgical robots: An experience report. In 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 785–790.
[150]
Casper Sloth Mariager, Daniel Kjaer Bonde Fischer, Jakob Kristiansen, and Matthias Rehm. 2019. Co-designing and field-testing adaptable robots for triggering positive social interactions for adolescents with cerebral palsy. In 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE, 1–6.
[151]
MATLAB. 2010. version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts.
[152]
John Alexander McDermid, Yan Jia, and Ibrahim Habli. 2019. Towards a framework for safety assurance of autonomous systems. In Artificial Intelligence Safety Conference. CEUR Workshop Proceedings, 1–7.
[153]
Steve McGuire, P. Michael Furlong, Terry Fong, Christoffer Heckman, Daniel Szafir, Simon J. Julier, and Nisar Ahmed. 2019. Everybody needs somebody sometimes: Validation of adaptive recovery in robotic space operations. IEEE Robot. Automat. Lett. 4, 2 (2019), 1216–1223.
[154]
David R. McLver. 2021. Hypothesis. Retrieved from https://github.com/HypothesisWorks/hypothesis.
[155]
Christopher Medrano-Berumen and Mustafaff Ilhan Akbaş. 2019. Abstract simulation scenario generation for autonomous vehicle verification. In SoutheastCon. IEEE, 1–6.
[156]
Alvaro Miyazawa, Pedro Ribeiro, Wei Li, A. L. C. Cavalcanti, Jon Timmis, and J. C. P. Woodcock. 2016. RoboChart: A state-machine notation for modelling and verification of mobile and autonomous robots. (2016).
[157]
Maurizio Mongelli, Marco Muselli, Andrea Scorzoni, and Enrico Ferrari. 2019. Accelerating prism validation of vehicle platooning through machine learning. In 4th International Conference on System Reliability and Safety (ICSRS). IEEE, 452–456.
[158]
Shahabuddin Muhammad, Nazeeruddin Mohammad, Abul Bashar, and Majid Ali Khan. 2019. Designing human assisted wireless sensor and robot networks using probabilistic model checking. J. Intell. Robot. Syst. 94, 3-4 (2019), 687–709.
[159]
Galen E. Mullins, Paul G. Stankiewicz, R. Chad Hawthorne, and Satyandra K. Gupta. 2018. Adaptive generation of challenging scenarios for testing and evaluation of autonomous vehicles. J. Syst. Softw. 137 (2018), 197–215.
[160]
Adnan Munawar and Gregory S. Fischer. 2019. An asynchronous multi-body simulation framework for real-time dynamics, haptics and learning with application to surgical robots. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.
[161]
Florian Mutter, Stefanie Gareis, Bernhard Schätz, Andreas Bayha, Franziska Grüneis, Michael Kanis, and Dagmar Koss. 2011. Model-driven in-the-loop validation: Simulation-based testing of UAV software using virtual environments. In 18th IEEE International Conference and Workshops on Engineering of Computer-based Systems. IEEE, 269–275.
[162]
Frederik Naujoks, Sebastian Hergeth, Katharina Wiedemann, Nadja Schömig, and Andreas Keinath. 2018. Use cases for assessing, testing, and validating the human machine interface of automated driving systems. In Human Factors and Ergonomics Society Annual Meeting. SAGE Publications, 1873–1877.
[163]
Cu D. Nguyen, Simon Miles, Anna Perini, Paolo Tonella, Mark Harman, and Michael Luck. 2012. Evolutionary testing of autonomous software agents. Auton. Agents Multi-agent Syst. 25, 2 (2012), 260–283.
[164]
Royal Academy of Engineering. 2015. Innovation in autonomous systems: Summary of an event held on Monday 22 June 2015 at the Royal Academy of Engineering.
[165]
Matthew O’Kelly, Houssam Abbas, Sicun Gao, Shin’ichi Shiraishi, Shinpei Kato, and Rahul Mangharam. 2016. APEX: Autonomous vehicle plan verification and execution. In SAE World Congress and Exhibition. SAE International.
[166]
Matthew O’Kelly, Aman Sinha, Hongseok Namkoong, Russ Tedrake, and John C. Duchi. 2018. Scalable end-to-end autonomous vehicle testing via rare-event simulation. In Conference on Advances in Neural Information Processing Systems. NeurIPS, 9827–9838.
[167]
Stephan Opfer, Stefan Niemczyk, and Kurt Geihs. 2016. Multi-agent plan verification with answer set programming. In 3rd Workshop on Model-driven Robot Software Engineering. ACM, 32–39.
[168]
Matthew O’Brien, Ronald C. Arkin, Dagan Harrington, Damian Lyons, and Shu Jiang. 2014. Automatic verification of autonomous robot missions. In International Conference on Simulation, Modeling, and Programming for Autonomous Robots. Springer, 462–473.
[169]
Jisun Park, Mingyun Wen, Yunsick Sung, and Kyungeun Cho. 2019. Multiple event-based simulation scenario generation approach for autonomous vehicle smart sensors and devices. Sensors 19, 20 (2019), 4456.
[170]
Corina S. Pasareanu, Divya Gopinath, and Huafeng Yu. 2018. Compositional verification for autonomous systems with deep learning components. 1 (2018). arXiv preprint arXiv:1810.08303
[171]
Shashank Pathak, Giorgio Metta, and Armando Tacchella. 2014. Is verification a requisite for safe adaptive robots? In IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 3399–3402.
[172]
José L. F. Pereira and Rosaldo J. F. Rossetti. 2012. An integrated architecture for autonomous vehicles simulation. In 27th Annual ACM Symposium on Applied Computing. ACM, 286–292.
[173]
Mauro Pezze and Michal Young. 2007. In Software Testing and Analysis: Process, Principles, and Techniques. Wiley.
[174]
Javier Poncela and M. C. Aguayo-Torres. 2013. A framework for testing of wireless underwater robots. Wirel. Person. Commun. 70, 3 (2013), 1171–1181.
[175]
David Porfirio, Allison Sauppé, Aws Albarghouthi, and Bilge Mutlu. 2018. Authoring and verifying human-robot interactions. In 31st Annual ACM Symposium on User Interface Software and Technology. ACM, 75–86.
[176]
Martin Proetzsch, Fabian Zimmermann, Robert Eschbach, Johannes Kloos, and Karsten Berns. 2010. A systematic testing approach for autonomous mobile robots using domain-specific languages. In Annual Conference on Artificial Intelligence. Springer, 317–324.
[177]
Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y. Ng. 2009. ROS: An open-source Robot Operating System. In ICRA Workshop on Open Source Software.
[178]
Nijat Rajabli, Francesco Flammini, Roberto Nardone, and Valeria Vittorini. 2021. Software verification and validation of safe autonomous cars: A systematic literature review. IEEE Access 9 (2021), 4797–4819. DOI:
[179]
Arvind Ramanathan, Laura L. Pullum, Faraz Hussain, Dwaipayan Chakrabarty, and Sumit Kumar Jha. 2016. Integrating symbolic and statistical methods for testing intelligent systems: Applications to machine learning and computer vision. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 786–791.
[180]
Q. Rao and J. Frtunikj. 2018. Deep learning for self-driving cars: Chances and challenges. In IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS). IEEE, 35–38.
[181]
Signe A. Redfield and Mae L. Seto. 2017. Verification challenges for autonomous systems. In Autonomy and Artificial Intelligence: A Threat or Savior? William F. Lawless, Ranjeev Mittu, Donald Sofge, and Stephen Russell (Eds.). Springer International Publishing, 103–127. DOI:
[182]
Pedro Ribeiro, Alvaro Miyazawa, Wei Li, Ana Cavalcanti, and Jon Timmis. 2017. Modelling and verification of timed robotic controllers. In International Conference on Integrated Formal Methods. Springer, 18–33.
[183]
Sergio Rico, Emelie Engström, and Martin Höst. 2019. A taxonomy for improving industry-academia communication in IoT vulnerability management. In 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 38–45.
[184]
Eric Rohmer, Surya P. N. Singh, and Marc Freese. 2013. V-REP: A versatile and scalable robot simulation framework. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 1321–1326.
[185]
Martijn Rooker, Pablo Horstrand, Aythami Salvador Rodriguez, Sebastian Lopez, Roberto Sarmiento, Jose Lopez, Ray Alejandro Lattarulo, Joshue Manuel Perez Rastelli, Zora Slavik, David Pereira, et al. 2018. Towards improved validation of autonomous systems for smart farming. In Smart Farming Workshop. ISEP.
[186]
Gregg Rothermel, Roland H. Untch, Chengyun Chu, and Mary Jean Harrold. 1999. Test case prioritization: An empirical study. In International Conference on Software Maintenance. IEEE Computer Society, 179–188. DOI:
[187]
Sasha Rubin. 2015. Parameterised verification of autonomous mobile-agents in static but unknown environments. In International Conference on Autonomous Agents and Multiagent Systems. ACM, 199–208.
[188]
Peter A. M. Ruijten, Antal Haans, Jaap Ham, and Cees J. H. Midden. 2019. Perceived human-likeness of social robots: Testing the Rasch model as a method for measuring anthropomorphism. Int. J. Soc. Robot. 11, 3 (2019), 477–494.
[189]
Rim Saddem, Olivier Naud, Karen Godary Dejean, and Didier Crestani. 2017. Decomposing the model-checking of mobile robotics actions on a grid. IFAC-PapersOnLine 50, 1 (2017), 11156–11162.
[190]
Francesca Saglietti and Matthias Meitner. 2016. Model-driven structural and statistical testing of robot cooperation and reconfiguration. In 3rd Workshop on Model-driven Robot Software Engineering. ACM, 17–23.
[191]
Francesca Saglietti, Stefan Winzinger, and Raimar Lill. 2014. Reconfiguration testing for cooperating autonomous agents. In International Conference on Computer Safety, Reliability, and Security. Springer, 144–155.
[192]
André Santos, Alcino Cunha, and Nuno Macedo. 2018. Property-based testing for the robot operating system. In 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation. ACM, 56–62.
[193]
Ichiro Satoh. 2018. An approach for testing software on networked transport robots. In 14th IEEE International Workshop on Factory Communication Systems (WFCS). IEEE, 1–4.
[194]
Ichiro Satoh. 2019. Developing and testing networked software for moving robots. In 14th International Conference on Evaluation of Novel Approaches to Software Engineering. Springer, 315–321.
[195]
Hans-Peter Schöner. 2018. Simulation in development and testing of autonomous vehicles. In Internationales Stuttgarter Symposium. Springer, 1083–1095.
[196]
David Seiferth and Matthias Heller. 2017. Testing and performance enhancement of a model-based designed ground controller for a diamond-shaped unmanned air vehicle (UAV). In IEEE Conference on Control Technology and Applications (CCTA). IEEE, 1988–1994.
[197]
Yuvaraj Selvaraj, Wolfgang Ahrendt, and Martin Fabian. 2019. Verification of decision making software in an autonomous vehicle: An industrial case study. In International Workshop on Formal Methods for Industrial Critical Systems. Springer, 143–159.
[198]
Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. 2018. AirSim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics. Springer, Zurich, Switzerland, 621–635.
[199]
Weijing Shi, Mohamed Baker Alawieh, Xin Li, Huafeng Yu, Nikos Arechiga, and Nobuyuki Tomatsu. 2016. Efficient statistical validation of machine learning systems for autonomous driving. In 35th International Conference on Computer-aided Design. ACM, 1–8.
[200]
Christoph Sippl, Florian Bock, David Wittmann, Harald Altinger, and Reinhard German. 2016. From simulation data to test cases for fully automated driving and ADAS. In IFIP International Conference on Testing Software and Systems. Springer, 191–206.
[201]
Gopinadh Sirigineedi, Antonios Tsourdos, Brian A. White, and Rafał Żbikowski. 2011. Kripke modelling and verification of temporal specifications of a multiple UAV system. Ann. Math. Artif. Intell. 63, 1 (2011), 31–52.
[202]
Michał Siwek, Leszek Baranowski, Jarosław Panasiuk, and Wojciech Kaczmarek. 2019. Modeling and simulation of movement of dispersed group of mobile robots using Simscape multibody software. In AIP Conference Proceedings. AIP Publishing LLC, 020045.
[203]
Marc Spislaender and Francesca Saglietti. 2018. Evidence-based verification of safety properties concerning the cooperation of autonomous agents. In 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 81–88.
[204]
Tomoo Sumida, Hiroyuki Suzuki, Sho Sei Shun, Kazuhito Omaki, Takaaki Goto, and Kensei Tsuchida. 2017. FDR verification of a system involving a robot climbing stairs. In IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS). IEEE, 875–878.
[205]
Xiaowu Sun, Haitham Khedr, and Yasser Shoukry. 2019. Formal verification of neural network controlled autonomous systems. In 22nd ACM International Conference on Hybrid Systems: Computation and Control. ACM, 147–156.
[206]
Zsolt Szalay, Mátyás Szalai, Bálint Tóth, Tamás Tettamanti, and Viktor Tihanyi. 2019. Proof of concept for Scenario-in-the-Loop (SciL) testing for autonomous vehicle technology. In IEEE International Conference on Connected Vehicles and Expo (ICCVE). IEEE, 1–5.
[207]
Zaid Tahir and Rob Alexander. 2020. Coverage based testing for V&V and safety assurance of self-driving autonomous vehicles: A systematic literature review. In IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE, 23–30. DOI:
[208]
Jianbo Tao, Yihao Li, Franz Wotawa, Hermann Felbinger, and Mihai Nica. 2019. On the industrial application of combinatorial testing for autonomous driving functions. In IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 234–240.
[209]
Mugur Tatar. 2015. Enhancing ADAS test and validation with automated search for critical situations. In Driving Simulation Conference (DSC). DSC Europe.
[210]
Unity Technologies. 2021. Unity. Retrieved from https://unity.com.
[211]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In 40th International Conference on Software Engineering. ACM, 303–314.
[212]
Thomas Tosik, Jasper Schwinghammer, Mandy Jane Feldvoß, John Paul Jonte, Arne Brech, and Erik Maehle. 2016. MARS: A simulation environment for marine swarm robotics and environmental monitoring. In OCEANS Conference. IEEE, 1–6.
[213]
Tarik Tosun, Gangyuan Jing, Hadas Kress-Gazit, and Mark Yim. 2018. Computer-aided compositional design and verification for modular robots. Robot. Res. 1 (2018), 237–252.
[214]
Garazi Juez Uriagereka, Estibaliz Amparan, Cristina Martinez Martinez, Jabier Martinez, Aurelien Ibanez, Matteo Morelli, Ansgar Radermacher, and Huascar Espinoza. 2019. Design-time safety assessment of robotic systems using fault injection simulation in a model-driven approach. In ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C). IEEE, 577–586.
[215]
Vandi Verma and Chris Leger. 2019. SSim: NASA Mars Rover robotics flight software simulation. In IEEE Aerospace Conference. IEEE, 1–11.
[216]
Federico Vicentini, Mehrnoosh Askarpour, Matteo G. Rossi, and Dino Mandrioli. 2019. Safety assessment of collaborative robotics through automated formal verification. IEEE Trans. Robot. 36, 1 (2019), 42–61.
[217]
Harsha Jakkanahalli Vishnukumar, Björn Butting, Christian Müller, and Eric Sax. 2017. Machine learning and deep neural network–Artificial intelligence core for lab and real-world test and validation for ADAS and autonomous vehicles: AI for efficient and quality test and validation. In Intelligent Systems Conference (IntelliSys). IEEE, 714–721.
[218]
Dennis Walter, Holger Täubig, and Christoph Lüth. 2010. Experiences in applying formal verification in robotics. In International Conference on Computer Safety, Reliability, and Security. Springer, 347–360.
[219]
Kai Wang and J. C. Cheng. 2019. Integrating hardware-in-the-loop simulation and BIM for planning UAV-based As-built MEP inspection with deep learning techniques. In 36th International Symposium on Automation and Robotics in Construction. IAARC, 310–316.
[220]
Rui Wang, Yingxia Wei, Houbing Song, Yu Jiang, Yong Guan, Xiaoyu Song, and Xiaojuan Li. 2018. From offline towards real-time verification for robot systems. IEEE Trans. Industr. Inform. 14, 4 (2018), 1712–1721.
[221]
Matt Webster, Clare Dixon, Michael Fisher, Maha Salem, Joe Saunders, Kheng Lee Koay, Kerstin Dautenhahn, and Joan Saez-Pons. 2015. Toward reliable autonomous robotic assistants through formal verification: A case study. IEEE Trans. Hum.-mach. Syst. 46, 2 (2015), 186–196.
[222]
Matt Webster, Maha Salem, Clare Dixon, Michael Fisher, and Kerstin Dautenhahn. 2014. Formal verification of an autonomous personal robotic assistant. In AAAI Spring Symposium.
[223]
Matt Webster, David Western, Dejanira Araiza-Illan, Clare Dixon, Kerstin Eder, Michael Fisher, and Anthony G. Pipe. 2019. A corroborative approach to verification and validation of human–robot teams. Int. J. Robot. Res. 39, 1 (2019), 73–99.
[224]
Dennis Leroy Wigand, Pouya Mohammadi, Enrico Mingo Hoffman, Nikos G. Tsagarakis, Jochen J. Steil, and Sebastian Wrede. 2018. An open-source architecture for simulation, execution and analysis of real-time robotics systems. In IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR). IEEE, 93–100.
[225]
Tichakorn Wongpiromsarn, Sayan Mitra, Andrew Lamperski, and Richard M. Murray. 2012. Verification of periodically controlled hybrid systems: Application to an autonomous vehicle. ACM Trans. Embed. Comput. Syst. 11, S2 (2012), 1–24.
[226]
Bingqing Xu, Qin Li, Tong Guo, Yi Ao, and Dehui Du. 2019. A quantitative safety verification approach for the decision-making process of autonomous driving. In International Symposium on Theoretical Aspects of Software Engineering (TASE). IEEE, 128–135.
[227]
Bingqing Xu, Qin Li, Tong Guo, and Dehui Du. 2019. A scenario-based approach for formal modelling and verification of safety properties in automated driving. IEEE Access 7 (2019), 140566–140587.
[228]
Wing Lok Yeung. 2011. Behavioral modeling and verification of multi-agent systems for manufacturing control. Exp. Syst. applic. 38, 11 (2011), 13555–13562.
[229]
Levent Yilmaz. 2017. Verification and validation of ethical decision-making in autonomous systems. In Symposium on Modeling and Simulation of Complexity in Intelligent, Adaptive and Autonomous Systems. Springer, 1–12.
[230]
Fu Yujian and Drabo Mebougna. 2014. Formal modeling and verification of dynamic reconfiguration of autonomous robotics systems. In International Conference on Embedded Systems and Applications (ESA). Csrea, 14.
[231]
Sunkil Yun, Takaaki Teshima, and Hidekazu Nishimura. 2019. Human–machine interface design and verification for an automated driving system using system model and driving simulator. IEEE Consum. Electron. Mag. 8, 5 (2019), 92–98.
[232]
Chi Zhang, Yuehu Liu, Danchen Zhao, and Yuanqi Su. 2014. RoadView: A traffic scene simulator for autonomous vehicle simulation testing. In 17th International IEEE Conference on Intelligent Transportation Systems (ITSC). IEEE, 1160–1165.
[233]
Xiaoyang Zhang, Hongpeng Wang, Jingtai Liu, and Haifeng Li. 2019. CyberEarth: A virtual simulation platform for robotics and cyber-physical systems. In IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 858–863.
[234]
Xingyu Zhao, Matt Osborne, Jenny Lantair, Valentin Robu, David Flynn, Xiaowei Huang, Michael Fisher, Fabio Papacchini, and Angelo Ferrando. 2019. Towards integrating formal verification of autonomous robots with battery prognostics and health management. In International Conference on Software Engineering and Formal Methods. Springer, 105–124.
[235]
Xingyu Zhao, Valentin Robu, David Flynn, Fateme Dinmohammadi, Michael Fisher, and Matt Webster. 2019. Probabilistic model checking of robots deployed in extreme environments. In AAAI Conference on Artificial Intelligence. AAAI, 8066–8074.
[236]
Xingyu Zhao, Valentin Robu, David Flynn, Kizito Salako, and Lorenzo Strigini. 2019. Assessing the safety and reliability of autonomous vehicles from road testing. In IEEE 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 13–23.
[237]
Jinwei Zhou, Roman Schmied, Alexander Sandalek, Helmut Kokal, and Luigi del Re. 2016. A framework for virtual testing of ADAs. SAE Int. J. Passeng. Cars-Electron. Electric. Syst. 9, 2016-01-0049 (2016), 66–73.
[238]
Marc René Zofka, Marc Essinger, Tobias Fleck, Ralf Kohlhaas, and J. Marius Zöllner. 2018. The sleepwalker framework: Verification and validation of autonomous vehicles by mixed reality lidar stimulation. In IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR). IEEE, 151–157.

Cited By

View all
  • (2024)Deep iterative fuzzy pooling in unmanned robotics and autonomous systems for Cyber-Physical systemsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23572146:2(4621-4639)Online publication date: 14-Feb-2024
  • (2024)How Does Simulation-Based Testing for Self-Driving Cars Match Human Perception?Proceedings of the ACM on Software Engineering10.1145/36437681:FSE(929-950)Online publication date: 12-Jul-2024
  • (2024)Test-Driven Inverse Reinforcement Learning Using Scenario-Based Testing2024 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55156.2024.10588652(827-834)Online publication date: 2-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 2
March 2023
946 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3586025
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2023
Online AM: 16 June 2022
Accepted: 13 May 2022
Revised: 09 May 2022
Received: 27 January 2021
Published in TOSEM Volume 32, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Verification and validation
  2. robotics
  3. autonomous systems
  4. testing
  5. literature survey

Qualifiers

  • Survey

Funding Sources

  • UKRI Trustworthy Autonomous Systems Node in Verifiability

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4,744
  • Downloads (Last 6 weeks)588
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Deep iterative fuzzy pooling in unmanned robotics and autonomous systems for Cyber-Physical systemsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23572146:2(4621-4639)Online publication date: 14-Feb-2024
  • (2024)How Does Simulation-Based Testing for Self-Driving Cars Match Human Perception?Proceedings of the ACM on Software Engineering10.1145/36437681:FSE(929-950)Online publication date: 12-Jul-2024
  • (2024)Test-Driven Inverse Reinforcement Learning Using Scenario-Based Testing2024 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55156.2024.10588652(827-834)Online publication date: 2-Jun-2024
  • (2024)Specifying and Monitoring Safe Driving Properties with Scene Graphs2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610973(15577-15584)Online publication date: 13-May-2024
  • (2024)Support Remote Attestation for Decentralized Robot Operating System (ROS) using Trusted Execution Environment2024 IEEE International Conference on Blockchain and Cryptocurrency (ICBC)10.1109/ICBC59979.2024.10634382(693-695)Online publication date: 27-May-2024
  • (2024)Bayesian learning for the robust verification of autonomous robotsCommunications Engineering10.1038/s44172-024-00162-y3:1Online publication date: 27-Jan-2024
  • (2024)Automated system-level testing of unmanned aerial systemsAutomated Software Engineering10.1007/s10515-024-00462-931:2Online publication date: 1-Aug-2024
  • (2024)Formal Verification of Path Planning Safety and Reachability in Unmanned Surface VehiclesAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5675-9_2(15-26)Online publication date: 1-Aug-2024
  • (2023)Causal Models to Support Scenario-Based Testing of ADASIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.331747525:2(1815-1831)Online publication date: 2-Oct-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media