Module-3 Syllabus:: Path Testing
Module-3 Syllabus:: Path Testing
Module- 3
Syllabus:
Structural Testing: Overview, Statement testing, Branch testing, Condition testing, Path testing:
DD paths, Test coverage metrics, Basis path testing, guidelines and observations, Data –Flow
testing: Definition-Use testing, Slice based testing, Guidelines and observations. Test Execution:
Overview of test execution, from test case specification to test cases, Scaffolding, Generic versus
specific scaffolding, Test oracles, Self-checks as oracles, Capture and replay
Path Testing
The distinguishing characteristic of structural testing methods is that they are all based on
the source code of the program tested, and not on the specification.
Program Graph:
● Given a program written in an imperative programming language, the program
graph is a directed graph in which nodes are statement fragments and edges
represent flow of control.
● If i and j are nodes in the program graph, an edge exists from node i to node j iff the
statement fragment corresponding to node j can be executed immediately after the
statement fragment corresponding to node i.
The Program graph can be illustrated with the Pseudocode implementation of the triangle
program
A chain is a path in which the initial and terminal nodes are distinct, and every
interior node has indegree = 1 and outdegree = 1
Definition: DD-Path
Cases 1 & 2: Establish the unique source and sink nodes of the program graph of a
structured program as initial and final DD-Paths.
Case 3: It deals with complex nodes; it ensures that no node is contained in more than
one DD-Path.
Case 4: Needed for short branches; it also preserves the one-fragment, one-DD-Path
principle.
Case 5: It is the "normal" case, in which a DD-Path is a single-entry, single-exit sequence
of nodes (a chain).
Table 1: Types of DD-Paths of Fig 1
Program Graph Nodes DD-Path Name Case of Definition
4 First 1
5-8 A 5
9 B 3
10 C 4
11 D 4
12 E 3
13 F 3
14 H 3
15 I 4
16 J 3
17 K 4
18 L 4
19 M 3
20 N 3
21 G 4
22 O 3
23 Last 2
Definition DD-Path Graph
Given a program written in an imperative language, the DD-Path graph is the directed
graph in which nodes are DD-Paths of its program graph, and edges represent control
flow between successor DD-Paths.
Most quality organizations expect the C1 metric (DD-Path coverage) as the minimum
acceptable level of test coverage. The statement coverage metric (C0) is less adequate but
still widely accepted
These coverage metrics forms a lattice in which some are equivalent and some are implied
by others. The importance of the lattice is that there are always fault types that can be
revealed at one level while escaping detection by inferior levels of testing.
When DD-Path coverage is attained by a set of test cases, roughly 85% of all faults are
revealed.
Metric based testing takes a closer look on techniques that exercise source code in terms
of the test coverage metrics
Note: Miller's test coverage metrics are based on program graphs in which nodes are full
statements, whereas the formulations allows statement fragments to be nodes
Example: In triangle problem nodes 9, 10, 11, and 12 are a complete if-then-else statement.
If nodes correspond to full statements then execute just one of the decision alternatives
and satisfy the statement coverage criterion
If statement fragments are considered, then divide such a statement into three nodes.
Doing so results in predicate outcome coverage.
Whether or not this convention is followed, these coverage metrics require a set of test
cases such that, when executed, every node of the program graph is traversed at least once.
2. DD-Path Testing
When every DD-Path is traversed (the C1 metric), we know that each predicate outcome
has been executed. This shows traversing every edge in DD-Path graph.
Example: For if-then and if-then-else statements, both the true and the false branches are
covered (C1p coverage). For CASE statements, each clause is covered.
The variable IsATriangle is set to TRUE at node C and FALSE at node D. Node H is the
branch taken when IsATriangle is TRUE in the condition at node B, so any path containing
nodes D and H is infeasible.
Dependent pair of DD-Path coverage exercises these dependencies and hence a deeper
class of faults are revealed
One possibility is to make a Truth Table; Example: Compound condition of three simple
conditions would have eight rows, yielding eight test cases.
Another possibility is to reprogram compound predicates into nested simple if-then-else
logic, which will result in more DD-Paths to cover.
Multiple condition coverage ensures that statement complexity versus path complexity is
swept.
5. Loop Coverage
Consider the loops such as concatenated, nested, and knotted as shown in Figure 3.
1.
Every loop involves a decision and need to test both outcomes of the decision that is
(1) Traverse the loop (2) The other is to exit the loop.
Use a modified boundary value approach to test a loop, where the loop index is given
its minimum, nominal & maximum values.
● Once a loop has been tested, the tester condenses it into a single node. If loops are
nested, this process is repeated starting with the innermost loop and working
outward. This results multiples of test cases because each loop index variable acts
like an input variable.
● If loops are knotted, analyze carefully in terms of the dataflow methods.
The basis of a vector space: The basis of a vector space contains a set of vectors that are
independent of one another, and have a spanning property; this means that everything
within the vector space can be expressed in terms of the elements within the basis.
What McCabe noticed was that if a basis could be provided for a program graph, this basis
could be subjected to rigorous testing; if proven to be without fault, it could be assumed
that those paths expressed in terms of that basis are also correct.
3.4.1 McCabe’s Basis Path Testing
The method devised by McCabe to carry out basis path testing has four steps. These are:
1. Compute the program graph.
2. Calculate the cyclomatic complexity.
3. Select a basis set of paths.
4. Generate test cases for each of these paths
Step 1: To begin, we need a program graph from which to construct a basis. Figure 4 is a
directed graph which is the program graph (or the DD-Path graph) of some program. The
program does have a single entry (A) and a single exit (G).
We can create a strongly connected graph by adding an edge from the (every) sink node to
the (every) source node. Figure 5 shows McCabe's derived strongly connected graph.
Step 2: In graph theory, the cyclomatic complexity is defined for a strongly connected
graph is the number of linearly independent circuits in the graph. (A circuit is similar to a
chain: no internal loops or decisions occur, but the initial node is the terminal node.
A circuit is a set of 3-connected nodes.)
The formula for cyclomatic complexity is given by
V(G) = e - n + p or V(G) = e - n + 2p
Where, e: is the number of edges
n: is the number of nodes
p: is the number of connected regions.
The number of linearly independent paths from the source node to the sink node in
Fig. 4 is
V(G) = e - n + 2p = 10 - 7 + 2(1) = 5
The number of linearly independent circuits in the graph in Fig. 5 is
V(G) = e - n + p = 11 - 7 + 1 = 5
The cyclomatic complexity of the strongly connected graph in Figure 5 is 5. Hence there
are five linearly independent circuits. If the added edge is deleted from node G to node A,
these five circuits become five linearly independent paths from node A to node G.
Step3: An independent path is any path through the software that introduces at least one
new set of processing statements or a new condition.
1. Select a "baseline path” that corresponds to normal execution. (The baseline should
have as many decisions as possible.)
2. To get next basis paths, the baseline path is retraced and in turn each decision is
"flipped"; that is, when a node of outdegree ≥ 2 is reached, a different edge must be
taken.
3. Repeat this until all decisions have been flipped. When you reach V(G) basis paths,
you're done.
Take the example in Figure 5; here the first path is through nodes A, B, C, B, E, F, and G
as the baseline. The first decision node (outdegree ≥ 2) in this path is node A; so for the
next basis path, traverse edge 2 instead of edge 1. We get the path A, D, E, F, G, where we
retrace nodes E, F, G in path l to be as minimally different as possible. For the next path,
follow the second path, and take the other decision outcome of node D, which gives us the
path A, D, F, G. Now, only decision nodes Band C have not been flipped; doing so yields
the last two basis paths, A, B, E, F, G and A, B, C, G.
Notice that this set of basis paths is distinct from the one in below Paths: this is not
problematic, because a unique basis is not required.
p1: A, B, C, G
p2: A, B, C, B, C, G
p3: A, B, E, F, G
p4: A, D, E, F, G
p5: A, D, F, G
These paths look like a vector space by defining notions of addition and scalar
multiplication: path addition is one path followed by another path, and multiplication
corresponds to repetitions of a path.
We can check the independence of paths pl to p5 by examining the first five rows of this
incidence matrix. The bold with circled entries show edges that appear in exactly one path,
so paths p2 to p5 must be independent. Path p1 is independent of all of these, because any
attempt to express p1 in terms of the others introduces unwanted edges. None can be
deleted, and these five paths span the set of all paths from node A to node G.
To understand these problems, take DD-Path graph of the triangle program in Figure 9.4.
The first baseline path that corresponds to a scalene triangle is with sides 3, 4, 5. This test
case will traverse the path p1. Now, if we flip the decision at node B, we get path p2. Flip
the decision at node F, which yields the path p3. Continue to flip decision nodes in the
baseline path p1; the next node with outdegree = 2 is node H. When we node H is flipped
we get the path p4. Next flip node J to get p5.
In reality paths p2 and p3 are both infeasible. Path p2 is infeasible, because passing through
node D means the sides are not a triangle; so the outcome of the decision at node F must
be node G. Similarly, in p3, passing through node C means the sides do form a triangle; so
node G cannot be traversed.
One solution to this problem is to always require that flipping a decision results in a
semantically feasible path. Another is to reason about logical dependencies. For the triangle
problem we can identify two rules:
Incorporate these solutions with McCabe's baseline method, will yield the following
feasible basis path set. Notice that logical dependencies reduce the size of a basis set when
basis paths must be feasible.
Essential Complexity
Essential complexity, which is only the cyclomatic complexity of yet another form of
condensation graph. Condensation graphs are a way of simplifying an existing graph.
The concept behind essential complexity is that the program graph of a piece of software
is traversed until a structured programming construct is discovered; once located, the
structured programming construct is collapsed into a single node and the graph traversal
continues. The desired outcome of this procedure is to end up with a graph of V(G) = 1,
that is, a program made up of one node.
This process is followed in Figure 7 and Figure 8, which starts with the DD-Path graph of
the Pseudocode triangle program. The if-then-else construct involving nodes B, C, D and
E is condensed into node a, and then the three if-then constructs are condensed onto nodes
b, c, and d. The remaining if-then-else is condensed into node e, resulting in a condensed
graph with cyclomatic complexity V(G) = 1.
The bottom line for testers is the programs with high cyclomatic complexity require more
testing. The organizations that use the cyclomatic complexity metric, most will set
maximum acceptable complexity V(G) = 10 is a common choice.
1. Simplify the unit or plan to do more testing. If the unit is well structured, its essential
complexity is 1, so it can be simplified easily.
2. If the unit has an essential complexity that exceeds the guidelines, the best choice is
to eliminate the unstructures
Dataflow Testing
Dataflow testing refers to forms of structural testing that focus on the points at which
variables receive values and the points at which these values are used (or referenced).
Most programs deliver functionality in terms of data. Variables that represent data
somehow receive values and these values are used to compute values for other variables.
Early dataflow analyses centered on a set of faults that are known as define/reference
anomalies:
A variable that is defined but never used (referenced)
A variable that is used before it is defined
A variable that is defined twice before it is used
Each of these anomalies can be recognized from the concordance of a program. Because
the concordance information is compiler generated, these anomalies can be discovered by
static analysis: finding faults in source code without executing it.
Node n G(P) is a defining node of the variable v V, written as DEF(v, n), iff the value
of the variable v is defined at the statement fragment corresponding to node n.
Ex: Input statements, assignment statements, loop control statements, and procedure calls
When the code corresponding to such statements executes, the contents of the memory
location(s) associated with the variables are changed.
Node n G(P) is a usage node of the variable v V, written as USE(v, n), iff the value of
the variable v is used at the statement fragment corresponding to node n.
Ex: Output statements, assignment statements, conditional statements, loop control
statements and procedure calls. When the code corresponding to such statements executes,
the contents of the memory location(s) associated with the variables remain unchanged.
Definition: (P-use) and (C-use)
A usage node USE(v, n) is a predicate use (P-use) iff the statement n is a predicate
statement; otherwise, USE(v, n) is a computation use (C-use).
The nodes corresponding to predicate uses always have an outdegree ≥ 2, and nodes
corresponding to computation uses always have an outdegree ≥1.
Definition: (du-path)
Definition: (dc-path)
● This program computes the commission on the sales of the total numbers of locks,
stocks and barrels sold. The While loop is a classical sentinel controlled loop in
which a value of -1 for locks signifies the end of the sales data. The totals are
accumulated as the data values are read in the While loop.
● After printing this preliminary information, the sales value is computed, using the
constant item prices defined at the beginning of the program.
● The sales value is then used to compute the commission in the conditional portion
of the program
Program:
7. lockPrice = 45.0
8. stockPrice = 30.0
9. barrelPrice = 25.0
10. totalLocks = 0
11. total Stocks = 0
12. totalBarrels = 0
13. Input (locks)
14. While NOT(locks = -1) "Loop' condition uses -1 to indicate end of data
15. Input (stocks, barrels)
16. total Locks = totalLocks + locks
17. totalStocks = totalStocks + stocks
18. totalBarrels = totalBarrels + barrels
19. Input (locks)
20. EndWhile
Below table lists the define/usage nodes for the variables in the commission problem. Use
this information in conjunction with the program graph in Figure 9 to identify various
definition-use and definition-clear paths.
d
The table for selected Define/Use path is shown below. Table presents du-paths in the
commission problem; they are named by their beginning and ending nodes. The third
column indicates whether the du-paths are definition-clear.
The initial value definition for totalStocks occurs at node 11 and it is first used at node
17.Thus, the path (11, 17), which consists of the node sequence <11, 12, 13, 14, 15, 16,
17>, is definition-clear. The path (11, 22), which consists of the node sequence <11, 12,
13, (14, 15, 16, 17, 18, 19, 20)*, 21, 22> is not definition-clear because values of
totalStocks are defined at node 11 and node 17.
p1 = <13, 14>
p2 = <13, 14, 15, 16>
p3 = <19, 20, 14>
p4 = <19, 20, 14, 15, 16>
p5 = <10, 11, 12, 13, 14, 15, 16> is a du-path in which the initial value of totalLocks (0)
has a computation use. This path is definition-clear.
p6 = <10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 14, 21> Path p6 ignores the possible
repetition of the While loop (subpath <16, 17, 18, 19, 20, 14, 15>) might be traversed
several times. But p6 is a du-path that fails to be definition-clear.
p7 = <10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 14, 21, 22, 23, 24>
p7 = < p6, 22, 23, 24> Du-path p7 is not definition-clear because it includes node 16.
p9 = <16, 17, 18, 19, 20, 14, 21, 22, 23, 24> Both p8 & p9 are definition-clear
● The "built-up" version uses intermediate values, and these will appear as define and
usage nodes in the du-path analysis. So disallow du-paths from assignment
statements like 31 and 32, just consider du-paths that begin with the three "real"
defining nodes: DEF(commission, 33), DEF(commission, 37), and
DEF(commission, 38). Only one usage node is used: USE(commission, 42)
In the following definitions, T is a set of paths in the program graph G(P) of a program P,
with the set V of variables. Assume that the define/use paths are all feasible.
Definition: (All-Defs)
The set T satisfies the All-Defs criterion for the program P iff for every variable v V,
T contains definition-clear paths from every defining node of v to a use of v.
Definition: (All-Uses)
The set T satisfies the All-Uses criterion for the program P iff for every variable v V,
T contains definition-clear paths from every defining node of v to every use of v, and to
the successor node of each USE (v, n).
The set T satisfies the All-P-Uses/Some C-Uses criterion for the program P iff for every
variable v V, T contains definition-clear paths from every defining node of v to every
predicate use of v; if a definition of v has no P-uses, a definition-clear path leads to at least
one computation use.
The set T satisfies the All-C-Uses/Some P-Uses criterion for the program P iff for every
variable v V, T contains definition-clear paths from every defining node of v to every
computation use of v; if a definition of v has no C-uses, a definition-clear path leads to at
least one predicate use.
Definition: (All-du-paths)
The set T satisfies the All-du-paths criterion for the program P iff for every variable v E V,
T contains definition-clear paths from every defining node of v to every use of v and to the
successor node of each USE(v, n) and that these paths are either single-loop traversals or
cycle-free.
These test coverage metrics have several set theory-based relationships, which are referred
to as "subsumption". These relationships are shown in Figure 11
A program slice is a set of program statements that contributes to or affects a value for a
variable at some point in the program.
A program P that has a program graph G(P) and a set of program variables V. The nodes
in P(G) to refer to statement fragments.
fragment n, written S(V, n), is the set of node numbers of all statement fragments in P prior
to and including n that contribute to the values of variables in V at statement fragment n.
“The idea of slices is to separate a program into components that have some useful
(functional) meaning”.
1. "prior to" in the dynamic sense, a slice captures the execution time behavior of a
program with respect to the variable(s) in the slice. Eventually, develop a lattice
(a directed, acyclic graph) of slices, in which nodes are slices and edges
correspond to the subset relationship.
2. The "contribute": Means data declaration statements have an effect on the value
of a variable. The notion of contribution is partially clarified by the predicate (P-
use) and computation (C-use) usage
1. Slices on the locks: Slices on the locks variable show why it is potentially fault-prone.
It has a P-use at node 14 and a C-use at node 16 and has two definitions, the I-defs at
nodes 13 and 19.
Slices S10and S11are equal because nodes 21and 24 are an O-use and a C-use of
totalLocks
S13: S(totalStocks, 17) = {11, 13, 14, 15, 17, 19, 20}
S14: S(totalStocks, 22) = {11, 13, 14, 15, 17, 19, 20}
S15: S(totalBarrels, 12) = {12}
S16: S(totalBarrels,18) = {12, 13, 14, 15, 18, 19, 20}
S17: S(totalBarrels, 23) = {12, 13, 14, 15, 18, 19, 20}
5. Assignment statements:
Only one defining node exists for sales, the A-def at node 27. The remaining slices on
sales show the P-uses, C-uses, and the O-use in definition-clear paths.
S24: S(sales, 27) = {7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 25, 26, 27}
S25: S(sales, 28) = {7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 25, 26, 27}
S26: S(sales, 29) = {7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 25, 26, 27}
S27: S(sales, 33) = {7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 25, 26, 27}
S28: S(sales, 34) = {7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 25, 26, 27}
S29: S(sales, 37) = {7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 25, 26, 27}
S30: S(sales, 39) = {7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 25, 26, 27}
Six A-def nodes are used for commission. Three computations of commission are
controlled by P-uses of sales in the IF, ELSE IF logic. This yields three paths of slices
that compute commission.
S35: S(commission, 37) = {7, 8, 9 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 25, 26,
27, 36, 37}
S36: S(commission, 39) = {7,8, 9 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,20,24,25,26,27,29,
34, 38, 39}
S37: S(commission, 41) = {7, 8, 9 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 25, 26,
27, 29, 30, 31, 32, 33, 34, 35, 36,37, 38, 39}
Test Execution
Whereas test design, even when supported by tools, requires insight and ingenuity in similar
measure to other facets of software design, test execution must be sufficiently automated for
frequent reexecution without little human involvement. This chapter describes approaches for
creating the run-time support for generating and managing test data, creating scaffolding for test
execution, and automatically distinguishing between correct and incorrect test case executions.
Overview
Designing tests is creative; executing them should be as mechanical as compiling the latest version
of the product, and indeed a product build is not complete until it has passed a suite of test cases.
In many organizations, a complete build-and-test cycle occurs nightly, with a report of success or
problems ready each morning.
The purpose of run-time support for testing is to enable frequent hands-free reexecution of a test
suite. A large suite of test data may be generated automatically from a more compact and abstract
set of test case specifications. For unit and integration testing, and sometimes for system testing as
well, the software under test may be combined with additional "scaffolding" code to provide a
suitable test environment, which might, for example, include simulations of other software and
hardware resources. Executing a large number of test cases is of little use unless the observed
behaviors are classified as passing or failing. The human eye is a slow, expensive, and unreliable
instrument for judging test outcomes, so test scaffolding typically includes automated test oracles.
The test environment often includes additional support for selecting test cases (e.g., rotating
nightly through portions of a large test suite over the course of a week) and for summarizing and
reporting results.
with no duplicates") may designate many possible concrete test cases, and it may be desirable to
generate just one instance or many. There is no clear, sharp line between test case design and test
case generation. A rule of thumb is that, while test case design involves judgment and creativity,
test case generation should be a mechanical step.
Automatic generation of concrete test cases from more abstract test case specifications reduces the
impact of small interface changes in the course of development. Corresponding changes to the test
suite are still required with each program change, but changes to test case specifications are likely
to be smaller and more localized than changes to the concrete test cases.
Instantiating test cases that satisfy several constraints may be simple if the constraints are
independent (e.g., a constraint on each of several input parameter values), but becomes more
difficult to automate when multiple constraints apply to the same item. Some well-formed sets of
constraints have no solution at all ("an even, positive integer that is not the sum of two primes").
Constraints that appear to be independent may not be. For example, a test case specification that
constrains both program input and output imposes a conjunction of two constraints on output (it
conforms to the given output constraint and it is produced by the given input).
General test case specifications that may require considerable computation to produce test data
often arise in model-based testing. For example, if a test case calls for program execution
corresponding to a certain traversal of transitions in a finite state machine model, the test data must
trigger that traversal, which may be quite complex if the model includes computations and
semantic constraints (e.g., a protocol model in Promela). Fortunately, model-based testing is
closely tied to model analysis techniques that can be adapted as test data generation methods. For
example, finite state verification techniques typically have facilities for generating counter-
examples to asserted properties. If one can express the negation of a test case specification, then
treating it as a property to be verified will result in a counter-example from which a concrete test
case can be generated
Scaffolding
During much of development, only a portion of the full system is available for testing. In modern
development methodologies, the partially developed system is likely to consist of one or more
runnable programs and may even be considered a version or prototype of the final system from
very early in construction, so it is possible at least to execute each new portion of the software as
it is constructed, but the external interfaces of the evolving system may not be ideal for testing;
often additional code must be added. For example, even if the actual subsystem for placing an
order with a supplier is available and fully operational, it is probably not desirable to place a
thousand supply orders each night as part of an automatic test run. More likely a portion of the
order placement software will be "stubbed out" for most test executions.
Code developed to facilitate testing is called scaffolding, by analogy to the temporary structures
erected around a building during construction or maintenance. Scaffolding may include test drivers
(substituting for a main or calling program), test harnesses (substituting for parts of the deployment
environment), and stubs (substituting for functionality called or used by the software under test),
in addition to program instrumentation and support for recording and managing test execution. A
common estimate is that half of the code developed in a software project is scaffolding of some
kind, but the amount of scaffolding that must be constructed with a software project can vary
widely, and depends both on the application domain and the architectural design and build plan,
which can reduce cost by exposing appropriate interfaces and providing necessary functionality in
a rational order.
The purposes of scaffolding are to provide controllability to execute test cases and observability
to judge the outcome of test execution. Sometimes scaffolding is required to simply make a module
executable, but even in incremental development with immediate integration of each module,
scaffolding for controllability and observability may be required because the external interfaces of
the system may not provide sufficient control to drive the module under test through test cases, or
sufficient observability of the effect. It may be desirable to substitute a separate test "driver"
program for the full system, in order to provide more direct control of an interface or to remove
dependence on other subsystems.
Consider, for example, an interactive program that is normally driven through a graphical user
interface. Assume that each night the program goes through a fully automated and unattended
cycle of integration, compilation, and test execution. It is necessary to perform some testing
through the interactive interface, but it is neither necessary nor efficient to execute all test cases
that way. Small driver programs, independent of the graphical user interface, can drive each
module through large test suites in a short time.
When testability is considered in software architectural design, it often happens that interfaces
exposed for use in scaffolding have other uses. For example, the interfaces needed to drive an
interactive program without its graphical user interface are likely to serve also as the interface for
a scripting facility. A similar phenomenon appears at a finer grain. For example, introducing a
Java interface to isolate the public functionality of a class and hide methods introduced for testing
the implementation has a cost, but also potential side benefits such as making it easier to support
multiple implementations of the interface.
1 package JFlex.tests;
2
3 import JFlex.IntCharSet;
4 import JFlex.Interval;
5 import junit.framework.TestCase;
11 ...
12 public class CharClassesTest extends TestCase {
25 ...
26 public void testAdd1() {
27 IntCharSet set = new IntCharSet(new Interval('a','h'));
28 set.add(new Interval('o','z'));
29 set.add(new Interval('A','Z'));
30 set.add(new Interval('h','o'));
31 assertEquals("{ ['A'-'Z']['a'-'z'] }", set.toString());
32 }
33
34 public void testAdd2() {
35 IntCharSet set = new IntCharSet(new Interval('a','h'));
36 set.add(new Interval('o','z'));
37 set.add(new Interval('A','Z'));
38 set.add(new Interval('i','n'));
39 assertEquals("{ ['A'-'Z']['a'-'z'] }", set.toString());
40 }
99 ...
100 }
Figure 7.4: Excerpt of JFlex 1.4.1 source code (a widely used open-source scanner generator) and
accompanying JUnit test cases. JUnit is typical of basic test scaffolding libraries, providing support for test
execution, logging, and simple result checking (assertEquals in the example). The illustrated version of
JUnit uses Java reflection to find and execute test case methods; later versions of JUnit use Java annotation
(metadata) facilities, and other tools use source code preprocessors or generators.
Fully generic scaffolding may suffice for small numbers of hand-written test cases. For larger test
suites, and particularly for those that are generated systematically writing each test case by hand
is impractical. Note, however, that the Java code expressing each test case in Figure 7.4 follows a
simple pattern, and it would not be difficult to write a small program to convert a large collection
of input, output pairs into procedures following the same pattern. A large suite of automatically
generated test cases and a smaller set of hand-written test cases can share the same underlying
generic test scaffolding.
Scaffolding to replace portions of the system is somewhat more demanding, and again both generic
and application-specific approaches are possible. The simplest kind of stub, sometimes called a
mock, can be generated automatically by analysis of the source code. A mock is limited to checking
expected invocations and producing precomputed results that are part of the test case specification
or were recorded in a prior execution. Depending on system build order and the relation of unit
testing to integration in a particular process, isolating the module under test is sometimes
considered an advantage of creating mocks, as compared to depending on other parts of the system
that have already been constructed.
The balance of quality, scope, and cost for a substantial piece of scaffolding software - say, a
network traffic generator for a distributed system or a test harness for a compiler - is essentially
similar to the development of any other substantial piece of software, including similar
considerations regarding specialization to a single project or investing more effort to construct a
component that can be used in several projects.
The balance is altered in favor of simplicity and quick construction for the many small pieces of
scaffolding that are typically produced during development to support unit and small-scale
integration testing. For example, a database query may be replaced by a stub that provides only a
fixed set of responses to particular query strings
Test Oracles
It is little use to execute a test suite automatically if execution results must be manually inspected
to apply a pass/fail criterion. Relying on human intervention to judge test outcomes is not merely
expensive, but also unreliable. Even the most conscientious and hard-working person cannot
maintain the level of attention required to identify one failure in a hundred program executions,
little more one or ten thousand. That is a job for a computer.
Software that applies a pass/fail criterion to a program execution is called a test oracle, often
shortened to oracle. In addition to rapidly classifying a large number of test case executions,
automated test oracles make it possible to classify behaviors that exceed human capacity in other
ways, such as checking real-time response against latency requirements or dealing with
voluminous output data in a machine-readable rather than human-readable form.
Ideally, a test oracle would classify every execution of a correct program as passing and would
detect every program failure. In practice, the pass/fail criterion is usually imperfect. A test oracle
may apply a pass/fail criterion that reflects only part of the actual program specification, or is an
approximation, and therefore passes some program executions it ought to fail. Several partial test
oracles (perhaps applied with different parts of the test suite) may be more cost-effective than one
that is more comprehensive. A test oracle may also give false alarms, failing an execution that it
ought to pass. False alarms in test execution are highly undesirable, not only because of the direct
expense of manually checking them, but because they make it likely that real failures will be
overlooked. Nevertheless sometimes the best we can obtain is an oracle that detects deviations
from expectation that may or may not be actual failures.
One approach to judging correctness - but not the only one - compares the actual output or behavior
of a program with predicted output or behavior. A test case with a comparison-based oracle relies
on predicted output that is either precomputed as part of the test case specification or can be derived
in some way independent of the program under test. Precomputing expected test results is
reasonable for a small number of relatively simple test cases, and is still preferable to manual
inspection of program results because the expense of producing (and debugging) predicted results
is incurred once and amortized over many executions of the test case.
Support for comparison-based test oracles is often included in a test harness program or testing
framework. A harness typically takes two inputs: (1) the input to the program under test (or can be
mechanically transformed to a well-formed input), and (2) the predicted output. Frameworks for
writing test cases as program code likewise provide support for comparison-based oracles. The
assertEquals method of JUnit, illustrated in Figure 7.4, is a simple example of comparison-based
oracle support.
Comparison-based oracles are useful mainly for small, simple test cases, but sometimes expected
outputs can also be produced for complex test cases and large test suites. Capture-replay testing, a
special case of this in which the predicted output or behavior is preserved from an earlier execution,
is discussed in this chapter. A related approach is to capture the output of a trusted alternate version
of the program under test. For example, one may produce output from a trusted implementation
that is for some reason unsuited for production use; it may too slow or may depend on a component
that is not available in the production environment. It is not even necessary that the alternative
implementation be more reliable than the program under test, as long as it is sufficiently different
that the failures of the real and alternate version are likely to be independent, and both are
sufficiently reliable that not too much time is wasted determining which one has failed a particular
test case on which they disagree
Figure 7.5: A test harness with a comparison-based test oracle processes test cases consisting of (program
input, predicted output) pairs.
A third approach to producing complex (input, output) pairs is sometimes possible: It may be easier
to produce program input corresponding to a given output than vice versa. For example, it is
simpler to scramble a sorted array than to sort a scrambled array.
A common misperception is that a test oracle always requires predicted program output to compare
to the output produced in a test execution. In fact, it is often possible to judge output or behavior
without predicting it. For example, if a program is required to find a bus route from station A to
station B, a test oracle need not independently compute the route to ascertain that it is in fact a
valid route that starts at A and ends at B.
Oracles that check results without reference to a predicted output are often partial, in the sense that
they can detect some violations of the actual specification but not others. They check necessary
but not sufficient conditions for correctness. For example, if the specification calls for finding the
optimum bus route according to some metric, partial oracle a validity check is only a partial oracle
because it does not check optimality. Similarly, checking that a sort routine produces sorted output
is simple and cheap, but it is only a partial oracle because the output is also required to be a
permutation of the input. A cheap partial oracle that can be used for a large number of test cases
is often combined with a more expensive comparison-based oracle that can be used with a smaller
set of test cases for which predicted output has been obtained.
Ideally, a single expression of a specification would serve both as a work assignment and as a
source from which useful test oracles were automatically derived. Specifications are often
incomplete, and their informality typically makes automatic derivation of test oracles impossible.
The idea is nonetheless a powerful one, and wherever formal or semiformal specifications
(including design models) are available, it is worth- while to consider whether test oracles can be
derived from them. Some of the effort of formalization will be incurred either early, in writing
specifications, or later when oracles are derived from them, and earlier is usually preferable.
Model-based testing, in which test cases and test oracles are both derived from design models.
Self-Checks as Oracles
A program or module specification describes all correct program behaviors, so an oracle based on
a specification need not be paired with a particular test case. Instead, the oracle can be incorporated
into the program under test, so that it checks its own work (see Figure 7.6). Typically these self-
checks are in the form of assertions, similar to assertions used in symbolic execution and program
verification, but designed to be checked during execution
Figure 7.6: When self-checks are embedded in the program, test cases need not include predicted outputs.
Self-check assertions may be left in the production version of a system, where they provide much
better diagnostic information than the uncontrolled application crash the customer may otherwise
report. If this is not acceptable - for instance, if the cost of a runtime assertion check is too high -
most tools for assertion processing also provide controls for activating and deactivating assertions.
It is generally considered good design practice to make assertions and self-checks be free of side-
effects on program state. Side-effect free assertions are essential when assertions may be
deactivated, because otherwise suppressing assertion checking can introduce program failures that
appear only when one is not testing.
Self-checks in the form of assertions embedded in program code are useful primarily for checking
module and subsystem-level specifications, rather than overall program behavior. Devising
program assertions that correspond in a natural way to specifications (formal or informal) poses
two main challenges: bridging the gap between concrete execution values and abstractions used in
specification, and dealing in a reasonable way with quantification over collections of values.
Test execution necessarily deals with concrete values, while abstract models are indispensable in
both formal and informal specifications. The intended effect of an operation is described in terms
of a precondition (state before the operation) and postcondition (state after the operation), relating
the concrete state to the abstract model. Consider again a specification of the get method of
java.util.Map, with pre- and postconditions expressed as the Hoare triple
φ is an abstraction function that constructs the abstract model type (sets of key, value pairs) from
the concrete data structure. φ is a logical association that need not be implemented when reasoning
about program correctness. To create a test oracle, it is useful to have an actual implementation of
φ. For this example, we might implement a special observer method that creates a simple textual
representation of the set of (key, value) pairs. Assertions used as test oracles can then correspond
directly to the specification. Besides simplifying implementation of oracles by implementing this
mapping once and using it in several assertions, structuring test oracles to mirror a correctness
argument is rewarded when a later change to the program invalidates some part of that argument
(e.g., by changing the treatment of duplicates or using a different data structure in the
implementation).
In addition to an abstraction function, reasoning about the correctness of internal structures usually
involves structural invariants, that is, properties of the data structure that are preserved by all
operations. Structural invariants are good candidates for self-checks implemented as assertions.
They pertain directly to the concrete data structure implementation, and can be implemented within
the module that encapsulates that data structure. For example, if a dictionary structure is
implemented as a red-black tree or an AVL tree, the balance property is an invariant of the structure
that can be checked by an assertion within the module. Figure 7.7 illustrates an invariant check
found in the source code of the Eclipse programming invariant.
1 package org.eclipse.jdt.internal.ui.text;
2 import java.text.CharacterIterator;
3 import org.eclipse.jface.text.Assert;
4 /**
5 *A <code>CharSequence</code> based implementation of
6 * <code>CharacterIterator</code>.
7 * @since 3.0
8 */
9 public class SequenceCharacterIterator implements CharacterIterator {
13 ...
14 private void invariant() {
15 Assert.isTrue(fIndex >= fFirst);
16 Assert.isTrue(fIndex <= fLast);
17 }
49 ...
50 public SequenceCharacterIterator(CharSequence sequence, int first, int last)
51 throws IllegalArgumentException {
52 if (sequence == null)
53 throw new NullPointerException();
54 if (first < 0 || first > last)
55 throw new IllegalArgumentException();
56 if (last > sequence.length())
57 throw new IllegalArgumentException();
58 fSequence= sequence;
59 fFirst= first;
60 fLast= last;
61 fIndex= first;
62 invariant();
63 }
143 ...
144 public char setIndex(int position) {
145 if (position >= getBeginIndex() && position <= getEndIndex())
146 fIndex= position;
147 else
148 throw new IllegalArgumentException();
149
150 invariant();
151 return current();
152 }
263 ...
264 }
expressiveness has been purposely limited to ensure that specifications can be executed with
predictable and satisfactory performance. An important way in which specifications used for
human communication and reasoning about programs are more expressive and less constrained
than programming languages is that they freely quantify over collections of values.
For example, a specification of database consistency might state that account identifiers are
unique; that is, for all account records in the database, there does not exist another account record
with the same identifier.
It is sometimes straightforward to translate quantification in a specification statement into iteration
in a program assertion. In fact, some run-time assertion checking systems provide quantifiers that
are simply interpreted as loops. This approach can work when collections are small and quantifiers
are not too deeply nested, particularly in combination with facilities for selectively disabling
assertion checking so that the performance cost is incurred only when testing. Treating quantifiers
as loops does not scale well to large collections and cannot be applied at all when a specification
quantifies over an infinite collection. For example, it is perfectly reasonable for a specification to
state that the route found by a trip-planning application is the shortest among all possible routes
between two points, but it is not reasonable for the route planning program to check its work by
iterating through all possible routes.
The problem of quantification over large sets of values is a variation on the basic problem of
program testing, which is that we cannot exhaustively check all program behaviors. Instead, we
select a tiny fraction of possible program behaviors or inputs as representatives. The same tactic
is applicable to quantification in specifications. If we cannot fully evaluate the specified property,
we can at least select some elements to check (though at present we know of no program assertion
packages that support sampling of quantifiers). For example, although we cannot afford to
enumerate all possible paths between two points in a large map, we may be able to compare to a
sample of other paths found by the same procedure. As with test design, good samples require
some insight into the problem, such as recognizing that if the shortest path from A to C passes
through B, it should be the concatenation of the shortest path from A to B and the shortest path
from B to C.
A final implementation problem for self-checks is that asserted properties sometimes involve
values that are either not kept in the program at all (so-called ghost variables) or values that have
been replaced ("before" values). A specification of noninterference between threads in a
concurrent program may use ghost variables to track entry and exit of threads from a critical
section. The postcondition of an in-place sort operation will state that the new value is sorted and
a permutation of the input value. This permutation relation refers to both the "before" and "after"
values of the object to be sorted. A run-time assertion system must manage ghost variables and
retained "before" values and must ensure that they have no side-effects outside assertion checking.
more likely that an insignificant detail can either be ignored in comparisons or, even better, the
abstract input and output can be systematically transformed to reflect the intended change.
Further amplification of the value of a captured log can be obtained by varying the logged events
to obtain additional test cases. Creating meaningful and well-formed variations also depends on
the abstraction level of the log. For example, it is simpler to vary textual content recorded in a log
than to make an equivalent change to a recorded bitmap representation of that text.