6.1.1 Coverage Criteria.
(1) schema-related coverage.
API schemas are the main inputs that guide how to access the SUT and define what responses should be returned. For instance, as an example of the schema defined with the OpenAPI (Figure
1), an endpoint could be defined under a
URI path with an HTTP verb (e.g.,
POST,
GET), input parameters (e.g.,
Path Parameter), and possible responses (e.g., status code, response body). Then, there exist several black-box coverage metrics that are defined based on those elements [
49,
52,
53,
54,
55,
56,
57,
59,
60,
63,
68,
85,
89,
104,
107,
109,
113,
118,
126,
130,
131,
145,
152,
155,
156,
157,
160,
166,
167,
171,
172,
176,
177]:
—
HTTP status code could reflect a result of processing the request in the SUT, such as 2xx often represents a successful request. A set of testing criteria has been defined to compute a coverage of status codes that are returned during testing for each endpoint [
49,
52,
53,
54,
55,
56,
57,
59,
60,
63,
68,
85,
104,
107,
109,
113,
118,
126,
130,
131,
145,
152,
156,
157,
160,
166,
171,
172,
176,
177].
—
path provides the information to access the API, i.e., the full URI to access an endpoint could be constructed by
base path plus
path. For instance, Martin-Lopez et al. [
129] defined
path coverage, which assesses a number of paths accessed by the generated tests out of the total available paths in the schema. Banias et al. [
63] evaluated
paths tested by considering success and failure responses received by requests to the path. Ed-douibi et al. [
89] reported
endpoint coverage, and an endpoint is considered as covered only if all of its operations are covered.
—
operations are exposed to make requests (i.e., HTTP verb with path) for performing actions on the services. As discussed, a test is regarded as a sequence of the operations. To evaluate REST API testing approaches, Banias et al. reported a number of operations tested in Reference [
63].
—
input parameter is the information required to set when making a request. The parameter could be different with various types and constraints (e.g.,
required,
minimum). Then, the input parameter-based metrics are defined to assess if various values for the parameters have been examined during testing (e.g., each Boolean parameter should had been evaluated with both values
true and
false). The generation could also be guided by the metrics; for example, Banias et al. [
63] defined various configurations to generate tests with consideration of the required property of the parameters.
—
response defines a list of possible responses to return per operation. Metrics relating to responses are used to examine whether various responses have been obtained (e.g., for an enumeration element in a returned body payload, coverage metrics would check if every single item in the enumeration has been returned at least once). Response body property coverage is reported to assess API schema-based REST API testing approach in Reference [
63].
In addition, Martin-Lopez et al. [
129] proposed 10 coverage metrics based on the schema that have been applied for assessing REST API testing approaches in Reference [
63]. The 10 coverage metrics enable assessments of generated tests in fuzzing REST APIs with different inputs and outputs, such as
parameter coverage, content-type coverage, and
response body properties coverage. The metrics were also enabled in fuzzers, such as
HsuanFuzz [
155] and
RESTest [
131]. Restats [
84] is a test coverage tool for assessing given tests based on the coverage metrics. In an empirical study [
83], such schema-related coverage metrics were employed to compare black-box fuzzers for REST APIs.
(2) code coverage.
Code coverage [
49,
52,
54,
55,
56,
57,
59,
115,
118,
126,
145,
152,
155,
171,
172,
176,
177] is one typical white-box criterion for evaluating testing approaches. It was applied as well to evaluate REST API fuzzers for both black-box [
59,
60] and white-box [
53,
55,
171,
172,
173,
177]. Among different code coverage criteria, line/statement coverage is one of the most common and supported ones in industry-strength coverage tools (e.g., JaCoCo for Java programs).
Code coverage was also used as an evaluation metric to assess the black-box REST API testing approaches [
60,
115,
118,
155]. For instance, in Reference [
60], Atlidakis et al. reported accumulated code coverage achieved along with the number of requests to be executed. To collect the code coverage,
TracePoint hook was configured in Ruby classes to trace the execution in this study. However, black-box testing of REST API could be performed on remote public APIs that are closed-source software. The code coverage in this context is applied only for evaluation purposes.
Code coverage could also be employed as a criteria to automate fuzzing, which is considered as white-box testing.
EvoMaster enables runtime code coverage (with heuristics to maximize it) automatically during fuzzing with code instrumentation. However, such automated collection of runtime code coverage is specific to the programming language, and
EvoMaster has enabled it for JVM (e.g., Java and Kotlin) [
52] and JavaScript [
175]. In addition,
Pythia [
59] employs the metrics of code coverage to produce tests. But, to enable such code coverage collection, it needs a pre-manual step to extract code information (such as block location) of the SUT.
(3) specialized metrics.
Besides schema-based and code coverage, there also exist other specialized metrics, more specific (and possibly less general) to the proposed approaches [
68,
77,
78,
98,
117,
118,
120,
130,
143,
156,
169].
For instance, in
RESTler [
60], a
grammar is derived based on the API schema for driving following test sequence generation, e.g., selecting next HTTP request based on the derived producer-consumer dependencies among the endpoints. A specific metric employed in this approach is based on the
grammar, i.e.,
grammar coverage. Martin-Lopez et al. [
127,
128,
130] defined
inter-parameter dependencies that represent constraints referring to two or more input parameters. Such dependencies are also used in
RestCT to generate test data [
166]. Alonso et al. [
156] enabled test data generation with realistic inputs using natural language processing and knowledge extraction techniques. The performance of the data generation was evaluated from a percentage of
valid API calls (i.e., 2xx status code) and
valid inputs (i.e.,
Syntactically valid and
Semantically valid). In Reference [
100], Godefroid et al. introduced the
error type metric, which is a pair of error code and error message. The metric was used to guide data fuzzing of REST API [
100], i.e., maximizing
error type coverage.
Chakrabarti and Rodriquez [
77] defined
POST Class Graph (PCG) to represent resources and connected relationships among them, then tests could be produced by maximizing coverage of the graph. UML state machine was applied in a model-based testing of REST APIs [
143]. Metrics specific to model such as state coverage and transition coverage are used to guide test generation.
Lin et al. [
115] constructed tree-based graph to analyze resource and resource dependency based on the API schema and responses. Then, test cases could be generated based on the graph with tree traversal algorithms.
To analyze security issues existing in REST API, Cheh and Chen [
78] analyzed levels of
sensitivity of data fields and API calls and also defined
exposure level that calculates a degree of such data fields and API calls exposed to potential attacks.
6.1.2 Fault Detection.
(1) service error.
The status code 5xx has been applied to identify potential faults in REST API testing [
49,
52,
53,
54,
55,
56,
57,
59,
60,
63,
66,
85,
89,
104,
107,
113,
118,
126,
130,
131,
145,
152,
155,
156,
160,
166,
171,
172,
176,
177]. With HTTP, the status code 5xx indicates errors caused by the server, and the request could not be processed until the server has been fixed. For instance, the 500 status code is generic to represent that an internal server error occurs when performing the given request. The status code 503 is more specific, stating that the service is unavailable, e.g., due to down for maintenance or the server is overloaded.
In most HTTP frameworks, when there is a crash in the business logic due to some faults (e.g., an unhandled null-pointer exception), the entire server does not crash. In these cases, the server would still reply to the incoming HTTP request, responding with a status code of 500. Therefore, 500 status codes in the responses can be used as an oracle to detect faults in RESTful APIs [
104]. However, not all 500 status codes are related to software faults. For example, if the API is connecting to a database, and the database is currently down, then the server would not be able to complete the request. In such a case, returning a 500 status code would be correct, although no software fault in the API is involved.
(2) violation of schema.
The API schema (such as OpenAPI) defines the response syntax for each operation [
53,
54,
55,
56,
57,
85,
89,
113,
126,
152,
160,
171,
172,
177], e.g., status code and response body. Actual responses should be always consistent with the syntax specified in the schema. Thus, any inconsistency between the actual response and the syntax could be regarded as faults in the REST API. For instance, Viglianisi et al. defined such oracles in
RestTestGen [
160] by using an OpenAPI library
14 to identify the mismatched responses. In QuickRest [
107], Karlsson et al. formulated such consistency as properties.
(3) violation of defined rules.
Service errors (based on status code) and violations of schema (based on OpenAPI) are general oracles for fault finding in the context of REST API. Besides, there also exist some oracles to identify faults based on rules that characterize the REST APIs in terms of
security,
behavior,
properties, and
regression [
61,
64,
66,
78,
92,
104,
107,
108,
113,
147,
156,
163,
169] (see Figure
5).
Security . As web services, security is critical for REST APIs. To enable test oracle relating to security, Atlidakis et al. [
61] proposed a set of rules that formalize desirable security-related properties of the REST APIs. Any violation of the rules is identified as potential security-related bugs in the SUT. The rules are mainly defined based on assessing accessibility of resources, such as
use-after-free rule: If a resource has been deleted, then it must not be accessible anymore.
Katt and Prasher [
171] proposed a quantitative approach to measure the kinds of security related metrics (i.e., vulnerability, security requirement, and security assurance) for web services. To calculate the metrics, test cases are defined for validating whether the SUT meets security requirements and any kind of vulnerabilities exists. Masood and Java [
133] identified various kinds of vulnerabilities that could exist in REST APIs, such as JSON Hijacking. Such vulnerabilities could be detected with both static analysis and dynamic analysis techniques. Barabanov et al. [
64] proposed an approach specific to detection of
Insecure Direct Object Reference (IDOR)/
Broken Object Level Authorization (BOLA) vulnerabilities for REST APIs. The approach analyzes the OpenAPI specification by identifying its elements (such as parameters) relating to IDOR/BOLA vulnerabilities, then generates tests for verifying the API with such elements using defined security rules. Zha et al. [
169] collected
Common Sense Security Policies(CSSP), such as
Access control,
URL spoofing, and private messages for Team Chat system and defined CSSP violation scenarios. Security and privacy risks can be identified if an API under the CSSP violation scenarios can still work, e.g., return a valid response. Barlas et al. [
66] studied
regex-based denial of service (ReDoS) vulnerabilities led by handling of input sanitization in web services. The vulnerabilities are allowed to be identified by verifying consistency between client-side and server.
Behavior . Based on the provided API schema, Ed-douibi et al. [
89] defined two rules to generate
nominal test cases and
faulty test cases. The
nominal test cases take the inputs inferred based on examples or constraints in the schema, and the successful response is expected to return (e.g., assert that the status code is 2xx). Regarding the
faulty test cases, it takes invalid inputs (e.g., missing required parameters, a string for a number parameter, a string violating its defined pattern), and the client error response is expected to return (e.g., assert that the status code is 4xx).
Liu et al. [
117] constructed five constraints of REST guidelines with models. Then, such models could be used to verify design models of REST APIs. Any violation of the constraint models is considered as a potential defect in its architecture design.
Pinheiro et al. [
143] defined UML state machines for modeling behaviors of REST APIs. The actual behavior (by executing the tests) should be consistent with the model (e.g., guard condition, invariant) as expected. Any inconsistency is recognized as potential faults of the SUT.
Properties . Most of the HTTP methods are idempotent, i.e., GET, PUT, DELETE, HEAD, OPTIONS, and TRACE. For such methods, the result of executing the method is independent of the number of repeated times, meaning that executing the method multiple times would not change the result compared to executing it once. Thus, assertions could be defined to check idempotency [
157]. For example, executing multiple identical GET requests should result in the same response, and after a successful DELETE, the responses of all following identical DELETE requests should be the same. Connectedness is examined in Reference [
77], which refers to accessibility among resources. For instance, assume that resource X owns resource Y; when performing a GET collection on Y referring to X, all available Y should appear in the response, otherwise the REST API is not “connected.”
The REST API could apply
HATEOAS (Hypermedia as the Engine of Application State). Then, the response might contain hypermedia links for accessing itself or other resources. For such responses, Vassiliou-Gioles [
157] defined assertions for validating availability of links in its response. In References [
161,
162], Vu et al. also proposed a model-based approach that enables formalization of hypermedia behaviors of the REST API with
\(\varepsilon\) -NFA, and the model could allow an identification of the faults by checking whether the SUT complies with it [
163]. Fertig and Braun [
92] also verified the REST APIs based on hypermedia constraints using model-based approaches.
Metamorphic relations capture necessary properties that the SUT should hold with multiple executions. To enable metamorphic testing of REST APIs, there exist several works to identify such
metamorphic relations of the web services with abstract
Metamorphic Relation Output Patterns (MROPs) [
122,
147] (such as equivalence, disjoint) or specific properties of the API [
120]. Then, faults could be detected by checking whether responses among the multiple requests conform to the identified relations.
Regression . Gazzola et al. [
98] enable monitoring and tracing of the microservices to record their execution. Such recorded execution slices can be abstracted and considered as a metric for generating regression tests, i.e., verify whether the same response could be received with the same request in the further version of the SUT. Godefroid et al. [
101] employed
RESTler to produce tests and enabled detection of regression faults by comparing behaviors with the same tests among different versions of the REST APIs.