Module2 - Functional Testing - Lecture Notes
Module2 - Functional Testing - Lecture Notes
Module-2
Functional Testing & Fault Based Testing
Boundary Value Testing
2.1 Boundary value Analysis
Consider a function, F, of two variables, x1 & x2, when the function F is implemented as a program,
the input variables x1 & x2 will have some possible boundaries:
a ≤ x1 ≤ b
c ≤ x2 ≤ d
The intervals [a, b] & [c, d] are referred to as the ranges of x1 & x2, the strong typing in writing
programs was to prevent the programmers from making the kinds of errors that result in faults that
are easily revealed by boundary value testing. The input space (domain) of our function F is shown in
Fig 2.1, any point within the shaded rectangle is a legitimate input to the function F.
Boundary value analysis focuses on the boundary of the input space to identify test cases. The
rationale behind boundary value testing is that errors tend to occur in the extreme values of an input
variable. The basic idea of boundary value analysis is based on a critical assumption: it is known as a
single fault assumption in reliability theory. This says that failures are only rarely the result of the
simultaneous occurrence of two (or more) faults. Thus, the boundary value analysis test cases are
obtained by holding the values of all but one variable at their nominal values, and letting that variable
assume its extreme values. The boundary value analysis test cases for our function F of two variables
is illustrated in Fig 2.2 are
{<x1nom, x2min>, <x1nom, x2min+>, <x1nom, x2nom>, <x1nom, x2max->, <x1nom, x2max>, <x1min, x2nom>, <x1min+,
x2nom>, <x1max-, x2nom>, <x1max, x2nom>}
Fig 2.2 Boundary value analysis test cases for a function of two variables.
The basic boundary value analysis technique can be generalized in two ways: by the number of
variables and by the kinds of ranges .if we have a function of n variables, except the nominal values,
the remaining variable are assumed as the min, min+, nom, max-, and max values, reapeating this
for each variable. Thus, for a function of n variables, boundary value analysis yields 4n+1 unique test
case. For eg. In NextDate function, variables are month, day, & year giving January as 1, February as
2 and so on. Or we define the variable month as enumerated type {Jan, Feb, … ,.Dec.}. In Triangle
problem, the lower bound of side lengths is clearly 1 since negative side length is not possible or has
an arbitrary upper limit such as 200 or 2000. In commission problem, the min, min+, nom, max-, and
max are determined as artificial bounds.
Boundary value analysis works well when the program to be tested is a function of several
independent variables that represent bounded physical quantities. The key words are independent
and physical quantities. Boundary value analysis presumes the variables to be truly independent. So
boundary value analysis catch end-of-month & end-of-year faults. Boundary value analysis test cases
are derived from the extrema of bounded, independent variables that refer to physical quantities, with
no consideration of the nature of the function, or of the semantic meaning of the variables.
Limitations are summarized as:
– Boundaries are not always clear, eg: upper bound on integers
– Bounds are not appropriate for inputs like Boolean
– Not suitable for independent and physical quantities
– MAKES THE SINGLE FAULT ASSUMPTION
Robustness testing is a simple extension of boundary value analysis: when the extrema are extended
with a value slightly greater than the maximum (max+) and a value slightly less than the minimum
(min-). A robustness test case for a function of two variables is shown in Fig 2.3. Robustness testing
is not with the inputs, but with the expected outputs. It also represents that: Robustness testing for a
function of n variables generates 6n+1 test cases
Boundary value analysis, makes the single fault assumption of reliability theory, here we have
five element set that contains the min, min+, max-, & max values. Boundary value analysis if the
proper subsets of the worst case test cases. Result is shown in Fig 2.4. It also represents that: worst
case testing for a function of n variables generates 5n test cases, as opposed to 4n+1 test case for
boundary value analysis.
Worst-case testing follows the generalization pattern as in boundary value analysis. Best
application for worst case testing is where physical variables have numerous interactions, and where
failures of the function are extremely costly. The Cartesian product of the seven set element sets in
robustness resting resulting in 7n test cases shown in Fig 2.5
Fig 2.5 Robust worst-case test cases for a function of two variables.
Tester / testing team uses domain knowledge and past experience to choose test cases
Depends on heuristics
Not a structured or a formal method
Is AD-HOC and no principle is involved
Least systematic and least uniform
Adds to the confidence in testing when used additionally with a principled method
In the triangle problem, the variables are integers, the lower bound of the ranges are all 1 and take
arbitrarily 200 as an upper bound. Table 2.1 contains boundary value test cases using these ranges.
1 1 1 1 Equilateral
2 1 1 2 Not a Triangle
6 1 2 1 Not a Triangle
7 1 2 2 Isosceles
1 1 1 1812 1/2/1812
2 1 1 1813 1/2/1813
3 1 1 1912 1/2/1912
4 1 1 2011 1/2/2011
5 1 1 2012 1/2/2012
6 1 2 1812 1/3/1812
7 1 2 1813 1/3/1813
8 1 2 1912 1/3/1912
9 1 2 2011 1/3/2011
10 1 2 2012 1/3/2012
11 1 15 1812 1/16/1812
12 1 15 1813 1/16/1813
13 1 15 1912 1/16/1912
14 1 15 2011 1/16/2011
15 1 15 2012 1/16/2012
16 1 30 1812 1/31/1812
17 1 30 1813 1/31/1813
18 1 30 1912 1/31/1912
19 1 30 2011 1/31/2011
20 1 30 2012 1/31/2012
21 1 31 1812 2/1/1812
22 1 31 1813 2/1/1813
23 1 31 1912 2/1/1912
24 1 31 2011 2/1/2011
25 1 31 2012 2/1/2012
2.5.3 Test Cases for the Commission Problem using Boundary value analysis
Boundary values for the output range, near threshold points of $1000 and $1800
1 1 1 1 100 10 Output
minimum
3 1 2 1 130 13 Output
minimum+
5 5 5 5 500 50 Midpoint
7 10 9 10 970 97 Border
point-
Limitations of BVA:
1) Boolean and logical variables present a problem for Boundary Value Analysis.
2) BVA assumes the variables to be truly independent which is not always possible.
3) BVA test cases have been found to be rudimentary because they are obtained with very little
insight and imagination.
1) The normal versus robust values and the single fault versus the multiple-fault assumption theory
result in better testing. These methods can be applied to both input and output domain of any
program.
3) We must bear in mind that we can create extreme boundary results from non-extreme input
values.
Important aspect of equivalence class is that they form a partition of a set, where partition
refers to a collection of mutually disjoint subsets, the union of which is the entire set. This has two
implications for testing: the entire set is represented provides a form of completeness, and the
disjointedness ensures a form of non-redundancy. The idea of equivalence relation is to identify test
cases by using one element from each equivalence class. If the equivalence classes are chosen wisely,
this reduces the potential redundancy among test cases. In a triangle problem, (5,5,5) as input for a
test cases. The key of equivalence class testing is the choice of the equivalence relation that
determines the classes. For eg, consider a function F, with two variables x1 & x2 will have
boundaries. And the interval within the boundaries:
It uses one variable from each equivalence class (interval) in a test case. We always the same number
of weak equivalence class test cases as classes in the partition with the largest number of subsets as in
Fig 2.6.
It is based on the multiple fault assumption, so we need test cases from each element of the pattern of
equivalence classes as in Fig 2.7. The Cartesian product guarantees that
we have a notion of completeness in two senses: we cover all equivalence classes, and we have all
possible combinations of inputs. Equivalence class testing defines classes of the input domain.
The robust indicates invalid values, and the weak part refers to the single fault assumption.
Disadvantages:
1. The specification does not define what the expected output for an invalid input should be.
Then the testers spend a lot of time defining expected outputs for these cases.
2. Strongly typed language eliminates the need for the consideration of invalid inputs.
Here we consider invalid values, and the strong part refers to the multiple fault assumption. We
obtain test cases from each element of the Cartesian product of all the equivalence classes as in Fig
2.9.
There are four outputs can occurs: Not a triangle, Scalene, Isosceles, and Equilateral. So these are
used to identify the equivalence class as follows:
1) Weak Normal Equivalence Class: The four weak normal equivalence class test cases can be
defined as under
2) Strong Normal Equivalence Class: Since no valid subintervals of variables a, b and c exist, so
the strong normal equivalence class test cases are identical to the weak normal equivalence class test
cases.
3) Weak Robust Equivalence Class: Considering the invalid values for a, b and c yields the
following additional weak robust equivalence class test cases
4) Strong Robust Equivalence Class: Test Cases falling under this category are
It may be noted that the expected outputs describe the invalid input values thoroughly.
Considering the base equivalence classes on the input domain, we can derive richer set of test cases.
Consider the possibilities of three integers, a, b, and c
Other possibilities are exactly one pair of equal sides which do not form a traingle
D6 = {<a, b, c>: a ≥ b + c}
D7 = {<a, b, c>: b ≥ a + c}
D8 = {<a, b, c>: c ≥ a + b}
D6 = {<a, b, c>: a= b + c}
"Next Date" is a function consisting of three variables like: month (mm), date (dd) and year (yyyy). It
returns the date of next day as output. It reads current date as input date.
C1: 1 ≤ month ≤ 12
C2: 1 ≤ day ≤ 31
C3: 1812 ≤ year ≤ 2012
1) & 2) Weak Normal & Strong Normal Equivalence Class: Since the number of valid classes
equals the number of independent variables, only one weak normal equivalence class test case occurs
and it is identical to the strong normal equivalence class test case WN1 & SN1
Hence we get the above test case on the basis of valid classes – M1, D1 and Y1 described above.
3) Weak Robust Equivalence Class: Test Cases falling under this category are as under
Hence we get 7 test cases based on the valid and invalid classes of the input domain as described
above.
4) Strong Robust Equivalence Class: Test Cases falling under this category are
Test Case Month (mm) Day (dd) Year (yyyy) Expected Output
ID
We need the modified classes as we know that at the end of a month, the next day is 1 and the month
is incremented. At the end of a year, both the day and the month are reset to 1 and the year is also
incremented. Finally, the problem of leap year makes determining the last day of a month interesting.
With all the above in mind, we describe the following equivalence classes
So, now let us again identify the various equivalence class test cases:
1) Weak Normal Equivalence Class: As done earlier as well, the inputs are mechanically selected
from the approximate middle of the corresponding class.
Test Case Month (mm) Day (dd) Year (yyyy) Expected Output
ID
The random / mechanical selection of input values makes no consideration of our domain knowledge
and thus we have two impossible dates. This will always be a problem with 'automatic' test case
generation because all of our domain knowledge is not captured in the choice of equivalence classes.
2) Strong Normal Equivalence Class: The strong normal equivalence class test cases for the revised
classes are:
Test Case ID Month (mm) Day (dd) Year (yyyy) Expected Output
So, three month classes, four day classes and three year classes results in 3 * 4 * 3 = 36 strong normal
equivalence class test cases. Furthermore, adding two invalid classes for each variable will result in
150 strong robust equivalence class test cases. It is quite difficult to describe all such 150 classes
here.
A rifle salesperson in the former Arizona Territory sold rifle locks, stocks, and barrels made
by a gunsmith in Missouri. Locks cost $45, stocks cost $30, and barrels cost $25. The salesperson had
to sell at least one complete rifle per month, and production limits were such that the most the
salesperson could sell in a month was 70 locks, 80 stocks, and 90 barrels. After each town visit, the
salesperson sent a telegram to the Missouri gunsmith with the number of locks, stocks, and barrels
sold in that town. At the end of a month, the salesperson sent a very short telegram showing –1 locks
sold. The gunsmith then knew the sales for the month were complete and computed the salesperson’s
commission as follows:
1) 10% on sales upto and including $1000.
2) 15% of the next $800.
3) And 20% on any sales in excess of $1800
The commission program produced a monthly sales report that gave the total number of
locks, stocks, and barrels sold, the salesperson’s total dollar sales, and, finally, the commission.
1) & 2) Weak Normal & Strong Normal Equivalence Class: Since the number of valid classes is
equal to the number of independent variables, so we have exactly one weak normal equivalence class
test case and again, it is identical to the strong normal equivalence class test case.
3) Weak Robust Equivalence Class: Test Cases falling under this category are as under
WR1 10 10 10 $100
4) Strong Robust Equivalence Class: Test Cases falling under this category are
In order to calculate the commission of sales, consider the equivalence classes defined on the output
range. Sale if a function of the number of locks, stocks, and barrels sold:
OR1 5 5 5 500 50
1) The weak forms of equivalence class testing (normal or robust) are not as comprehensive as the
corresponding strong forms.
2) If the implementation language is strongly typed and invalid values cause run-time errors then
there is no point in using the robust form.
3) If error conditions are a high priority, the robust forms are appropriate.
4) Equivalence class testing is approximate when input data is defined in terms of intervals and sets
of discrete values. This is certainly the case when system malfunctions can occur for out-of-limit
variable values.
5) Equivalence class testing is strengthened by a hybrid approach with boundary value testing
(BVA).
6) Equivalence class testing is used when the program function is complex. In such cases, the
complexity of the function can help identify useful equivalence classes.
7) Strong equivalence class testing makes a presumption that the variables are independent and the
corresponding multiplication of test cases raises issues of redundancy. If any dependencies occur,
they will often generate "error" test cases.
8) Several tries may be needed before the "right" equivalence relation is established.
9) The difference between the strong and weak forms of equivalence class testing is helpful in the
distinction between progression and regression testing.
In the above decision table, if conditions 1 & 2 are true then action1 will be set to yes. Its also
be represented as below with stubs as rules, when conditions c1, c2 &c3 are all true, actions a1 & a2
occurs. When c1 & c2 are true and c3 is false, actions a1 & a3 occur. The entry for c3 in the rule
where c1 is true & c2 is false is called as “don’t care” entry. Don’t cares indicate that either the
condition is irrelevant, or the condition does not apply?
c1 T T T F F F
c2 T T F T T F
c3 T F - T F -
a1 X X X
a2 X X
a3 X X
a4 X X
c1: a, b, c form a F T T T T T T T T
triangle
c2: a=b? - T T T T F F F F
c3: a=c? - T T F F T T F F
c4: b=c? - T F T F T F T F
a1:Not A Triangle X
a2:Scalene X
a3:Isosceles X X X
a4:Eqilateral X
a5:Impossible X X X
The above table1, the decision table for triangle program, here, if the integers a, b & c do not
constitute a triangle, we don’t check for the inequalities , In rules 3,4,& 6 , if two pairs of integers are
equal, by transitivity, the third pair must be equal; thus the negative entry makes these rules
impossible.
Table2: Rule Count for a Decision Table with Mutually Exclusive Conditions
Conditions Rule1 Rule2 Rules3
c1:month in M1 T - -
c2:month in M2 - T -
c3:month in M3 - - T
Rule Count 4 4 4
a1
In the table2, it represents the three mutually exclusive conditions for the month variable in
the NextDate problem. As a month is in exactly one equivalence class, we cannot have a rule in
which two entries are true. So don’t care entries (-) “must be false”. For limited entry decision tables,
if n conditions exist, there must be 2n rules. When don’t care entries indicate that the condition is
irrelevant, the rule in which no don’t care entry occur is considered as one rule, and don’t care entry
in a rule doubles the count of that rule. The rule count for the decision table is 4 for each rule as the
don’t cares are 2.
Table3: Refined Decision table with rule count for the triangle program
Stub Rule1 Rule2 Rules3 Rules3 Rules3 Rule4 Rule5 Rule6 Rule7 Rule8 Rule9
c4: a=b? - - - T T T T F F F F
c4: a=c? - - - T T F F T T F F
c4: b=c? - - - T F T F T F T F
Rule Count 32 16 8 1 1 1 1 1 1 1 1
a1:Not A X X X
Triangle
a2:Scalene X
a3:Isosceles X X X
a4:Eqilateral X
a5:Impossible X X X
In the table 3, the choice of conditions increases the size of a decision table. Here, we
expanded the old condition (c1: a, b, c form a triangle?) to view of the three inequalities of the
triangle property. If any one of these fails, the three integers do not constitute sides of a triangle.
The table above has a total rule count of 64; this can be calculated using the limited entry
formula as it’s a limited entry table.
Number of Rules = 2Number of Conditions
There are 11 functional test cases: 3 impossible cases, 3 ways to fail the triangle property, 1
way to get an equilateral triangle, 1 way to get a scalene triangle & 3 ways to get an isosceles
triangle.
If we expand the table2 with don’t care we get the below table5, here 3 rules have all entries
are T: rules 1.1, 2.1 & 3.1 and have 2 rules with T, T, F entries: rules 1.2 & 2.2, similarly, rules 1.3
&3.2 are identical so rules 2.3 & 3.3.
Conditions 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 3.1 3.2 3.3 3.4
c1:month in M1 T T T T T T F F T T F F
c2:month in M2 T T F F T T T T T F T F
c3:month in M3 T F T F T F T F T T T T
Rule Count 1 1 1 1 1 1 1 1 1 1 1 1
a1
c1:month in M1 T T T T F F F F
c2:month in M2 T T F F T T F F
c3:month in M3 T F T F T F T F
Rule Count 1 1 1 1 1 1 1 1
a1 X X X X X
Conditions 1-4 5 6 7 8 9
c1 T F F F F T
c2 - T T F F F
c3 - T F T F F
a1 X X X - - X
a2 - X X X - -
a3 X - X X X X
c1 T F F F F T
c2 - T T F F F
c3 - T F T F F
a1 X X X - - -
a2 - X X X - X
a3 X - X X X -
This decision table will have 256 rules, many of which will be impossible. To show these impossible
rules, we revise actions to be following
a1:day invalid for this month
a2:cannot happen in a non-leap year
a3:compute the next date
Conditions
c1:month in M1 T
c2:month in M2 T
c3:month in M3 T
c4:day in D1
c5:day in D2
c6:day in D3
c7:day in D4
c8:year in Y1
a1:Impossible
a2: NextDate
The main problem is at December month at rule 8: it has unknown entries, there is no option to
increment year or reset day & month so, to recover from this we consider December month as one
month with different possibilities of test cases.
Nextdate function is a basis for the source code we have generated with all the possibilities of
outcomes. So, Good testing can improve programming skills aslo.
4 4 30 2001 5/1/2001
10 1 31 2001 2/1/2001
15 12 31 2001 1/1/2002
16 2 15 2001 2/16/2001
17 2 28 2004 2/29/2004
18 2 28 2001 3/1/2001
19 2 29 2004 3/1/2001
1. This type of testing also works iteratively. The table that is drawn in the first iteration, acts as a
stepping stone to derive new decision table(s), if the initial table is unsatisfactory.
2. These tables guarantee that we consider every possible combination of condition values. This is
known as its "completeness property". This property promises a form of complete testing as
compared to other techniques.
3. Decision tables are declarative. There is no particular order for conditions and actions to occur.
Decision tables do not scale up well. We need to "factor" large tables into smaller ones to remove
redundancy.
1) This technique works well where lot of decision making takes place such as the triangle problem
and Next date problem.
2) The decision table technique is indicated for applications characterized by any of the following
Prominent if-then-else logic.
Logical relationships among input variables.
Calculations involving subsets of the input variables.
Cause - and - effect relationships between inputs and outputs.
High cyclomatic complexity.
3) Decision tables do not scale up well. We need to 'factor' large tables into smaller ones to remove
redundancy.
4) It works iteratively meaning that the table drawn in the first iteration, acts as a stepping stone to
design new decision tables, if the initial table is unsatisfactory.
Fault-Based Testing Fault-based testing uses a fault model directly to hypothesize potential faults in a
program under test, as well as to create or evaluate test suites based on its efficacy in detecting those
hypothetical faults.
A model of potential program faults is a valuable source of information for evaluating and designing
test suites. Some fault knowledge is commonly used in functional and structural testing, for example
when identifying singleton and error values for parameter characteristics in category- partition testing
or when populating catalogs with erroneous values, but a fault model can also be used more directly.
Fault-based testing uses a fault model directly to hypothesize potential faults in a program under test,
as well as to create or evaluate test suites based on its efficacy in detecting those hypothetical faults
2.13.1 Overview
Engineers study failures to understand how to prevent similar failures in the future. For example,
failure of the Tacoma Narrows Bridge in 1940 led to new understanding of oscillation in high wind
and to the introduction of analyses to predict and prevent such destructive oscillation in subsequent
bridge design. The causes of an airline crash are likewise extensively studied, and when traced to a
structural failure they frequently result in a directive to apply diagnostic tests to all aircraft
considered potentially vulnerable to similar failures.
Experience with common software faults sometimes leads to improvements in design methods
and programming languages. For example, the main purpose of automatic memory management in
Java is not to spare the programmer the trouble of releasing unused memory, but to prevent the
programmer from making the kind of memory management errors (dangling pointers, redundant
deallocations, and memory leaks) that frequently occur in C and C++ programs. Automatic array
bounds checking cannot prevent a programmer from using an index expression outside array
bounds, but can make it much less likely that the fault escapes detection in testing, as well as
limiting the damage incurred if it does lead to operational failure (eliminating, in particular, the
buffer overflow attack as a means of subverting privileged programs). Type checking reliably
detects many other faults during program translation.
The basic concept of fault-based testing is to select test cases that would distinguish the program
under test from alternative programs that contain hypothetical faults. This is usually approached by
modifying the program under test to actually produce the hypothetical faulty programs. Fault seeding
can be used to evaluate the thoroughness of a test suite (that is, as an element of a test adequacy
criterion), or for selecting test cases to augment a test suite, or to estimate the number of faults in a
program.
2.13.2 Assumptions in Fault-Based Testing
The effectiveness of fault-based testing depends on the quality of the fault model and on
some basic assumptions about the relation of the seeded faults to faults that might actually be
present. In practice, the seeded faults are small syntactic changes, like replacing one variable
reference by another in an expression, or changing a comparison from < to <=. We may hypothesize
that these are representative of faults actually present in the program.
Put another way, if the program under test has an actual fault, we may hypothesize that it
differs from another, corrected program by only a small textual change. If so, then we need merely
distinguish the program from all such small variants (by selecting test cases for which either the
original or the variant program fails) to ensure detection of all such faults. This is known as the
competent programmer hypothesis, an assumption that the program under test is "close to" (in the
sense of textual difference) a correct program.
Some program faults are indeed simple typographical errors, and others that involve
deeper errors of logic may nonetheless be manifest in simple textual differences. Sometimes,
though, an error of logic will result in much more complex differences in program text. This
may not invalidate fault-based testing with a simpler fault model, provided test cases sufficient
for detecting the simpler faults are sufficient also for detecting the more complex fault. This is
known as the coupling effect.
The coupling effect hypothesis may seem odd, but can be justified by appeal to a more
plausible hypothesis about interaction of faults. A complex change is equivalent to several smaller
changes in program text. If the effect of one of these small changes is not masked by the effect of
others, then a test case that differentiates a variant based on a single change may also serve to
detect the more complex error.
Fault-Based Testing: Terminology
Original program The program unit (e.g., C function or Java class) to be tested.
Program location A region in the source code. The precise definition is defined relative to the
syntax of a particular programming language. Typical locations are statements, arithmetic and
Boolean expressions, and procedure calls.
Alternate expression Source code text that can be legally substituted for the text at a program
location. A substitution is legal if the resulting program is syntactically correct (i.e., it compiles
without errors).
Alternate program A program obtained from the original program by substituting an alternate
expression for the text at some program location.
Distinct behavior of an alternate program R for a test t The behavior of an alternate program R is
distinct from the behavior of the original program P for a test t, if R and P produce a different result for
t, or if the output of R is not defined for t.
Distinguished set of alternate programs for a test suite T A set of alternate programs are
distinct if each alternate program in the set can be distinguished from the original program by at
least one test in T.
Fault-based testing can guarantee fault detection only if the competent programmer hypothesis
and the coupling effect hypothesis hold. But guarantees are more than we expect from other approaches
to designing or evaluating test suites, including the structural and functional test adequacy criteria
discussed in earlier chapters. Fault-based testing techniques can be useful even if we decline to take the
leap of faith required to fully accept their underlying assumptions. What is essential is to recognize the
dependence of these techniques, and any inferences about software quality based on fault-based testing,
on the quality of the fault model. This also implies that developing better fault models, based on hard
data about real faults rather than guesses, is a good investment of effort.
We say a mutant is valid if it is syntactically correct. A mutant obtained from the program of Figure
2.14 by substituting while for switch in the statement at line 13 would not be valid, since it would
result in a compile-time error. We say a mutant is useful if, in addition to being valid, its behavior
differs from the behavior of the original program for no more than a small subset of program test
cases. A mutant obtained by substituting 0 for 1000 in the statement at line 4 would be valid, but not
useful, since the mutant would be distinguished from the program under test by all inputs and thus
would not give any useful information on the effectiveness of a test suite. Defining mutation operators
that produce valid and useful mutations is a nontrivial task.
Figure 2.14: Program transduces converts line endings among Unix, DOS, and Macintosh
conventions. The main procedures, which selects the output line end convention and the
output procedure emit are not shown.
Since mutants must be valid, mutation operators are syntactic patterns defined relative to particular
programming languages. Figure 2.15 shows some mutation operators for the C language.
Constraints are associated with mutation operators to guide selection of test cases likely to
distinguish mutants from the original program. For example, the mutation operator svr (scalar
variable replacement) can be applied only to variables of compatible type (to be valid), and a test
case that distinguishes the mutant from the original program must execute the modified statement in
a state in which the original variable and its substitute have different values.
Figure 2.15: A sample set of mutation operators for the C language, with associated constraints
to select test cases that distinguish generated mutants from the original program
Given a program and a test suite T, mutation analysis consists of the following steps:
Select mutation operators If we are interested in specific classes of faults, we may select
a set of mutation operators relevant to those faults.
Generate mutants
Mutants are generated mechanically by applying mutation operators to the original
program. Distinguish mutants Execute the original program and each generated mutant
with the test cases in T A mutant is killed when it can be distinguished from the original
program.
Given a set of mutants SM and a test suite T, the fraction of nonequivalent mutants
killed by T measures the adequacy of T with respect to SM. Unfortunately, the
problem of identifying equivalent mutants is undecidable in general, and we could err
either by claiming that a mutant is equivalent to the program under test when it is not
or by counting some equivalent mutants among the remaining live mutants.
The adequacy of the test suite TS evaluated with respect to the four mutants of Figure
2.16 is 25%. However, we can easily observe that mutant Mi is equivalent to the
original program (i.e., no input would distinguish it.
For typical sets of syntactic mutants, a mutation-adequate test suite will also be adequate with
respect to simple structural criteria such as statement or branch coverage. Mutation adequacy can
simulate and subsume a structural coverage criterion if the set of mutants can be killed only by
satisfying the corresponding test coverage obligations. Statement coverage can be simulated by
applying the mutation operator sdl (statement deletion) to each statement of a program. To kill a
mutant whose only difference from the program under test is the absence of statement S requires
executing the mutant and the program under test with a test case that executes S in the original
program. Thus to kill all mutants generated by applying the operator sdl to statements of the
program under test, we need a test suite that causes the execution of each statement in the
original program. Branch coverage can be simulated by applying the opera ort cpr (constant for
predicate replacement) to all predicates of the program under test with constants True and False.
To kill a mutant that differs from the program under test for a predicate P set to the constant
value False, we need to execute the mutant and the program under test with a test case that
causes the execution of the True branch of P. To kill a mutant that differs from the program
under test for a predicate P set to the constant value True,we need to execute the mutant and the
program under test with a test case that causes the execution of the False branch of P.
The mutation analysis process described in the preceding sections, which kills mutants
based on the outputs produced by execution of test cases, is known as strong mutation. It can
generate a number of mutants quadratic in the size of the program. Each mutant must be compiled
and executed with each test case until it is killed. The time and space required for compiling all
mutants and for executing all test cases for each mutant may be impractical.
The computational effort required for mutation analysis can be reduced by decreasing the
number of mutants generated and the number of test cases to be executed. Weak mutation analysis
decreases the number of tests to be executed by killing mutants when they produce a different
intermediate state, rather than waiting for a difference in the final result or observable program
behavior.
Weak mutation analysis
With weak mutation, a single program can be seeded with many faults. A "metamutant"
Pushpalatha K.S, Asst.Prof., ISE Dept., AIT, B’lore. Page 64
Software Testing (18IS62) 2023
program is divided into segments containing original and mutated source code, with a mechanism
to select which segments to execute. Two copies of the meta-mutant are executed in tandem, one
with only original program code selected and the other with a set of live mutants selected.
Execution is paused after each segment to compare the program state of the two versions. If the
state is equivalent, execution resumes with the next segment of original and mutated code. If the
state differs, the mutant is marked as dead, and execution of original and mutated code is
restarted with a new selection of live mutants.
Weak mutation testing does not decrease the number of program mutants that must be
considered, but it does decrease the number of test executions and compilations. This
performance benefit has a cost in accuracy: Weak mutation analysis may "kill" a mutant even if
the changed intermediate state would not have an effect on the final output or observable behavior
of the program.
Like structural test adequacy criteria, mutation analysis can be used either to judge the
thoroughness of a test suite or to guide selection of additional test cases. If one is designing test
cases to kill particular mutants, then it may be important to have a complete set of mutants
generated by a set of mutation operators. If, on the other hand, the goal is a statistical estimate
of the extent to which a test suite distinguishes programs with seeded faults from the original
program, then only a much smaller statistical sample of mutants is required. Aside from its
limitation to assessment rather than creation statistical mutation analysis of test suites, the main
limitation of statistical mutation analysis is that partial coverage is meaningful only to the extent
that the generated mutants are a valid statistical model of occurrence frequencies of actual
faults. To avoid reliance on this implausible assumption, the target coverage should be 100% of
the sample; statistical sampling may keep the sample small enough to permit careful examination
of equivalent mutants.