EssentialsOfSoftwareTesting ecopyForStudents
EssentialsOfSoftwareTesting ecopyForStudents
Software Testing by Ralf Bierig, Stephen Brown, Edgar Galvan, and Joe Timoney. This
book may be downloaded by students in CS265 and CS608 for personal use only. Not
for re-distribution, re-sale or use in derivative works.
© Ralf Bierig, Stephen Brown, Edgar Galvan, and Joe Timoney 2020.
FT
Essentials
A
of
DR
Software
Testing
(Compiled: 2021-09-29 at 14:47)
FT
“For sounds in winter nights, and often in winter days, I heard the forlorn but
melodious note of a hooting owl indefinitely far; such a sound as the frozen earth would
yield if struck with a suitable plectrum, the very lingua vernacula of Walden Wood, and
quite familiar to me at last, though I never saw the bird while it was making it.”
A
Walden
Henry David Thoreau
DR
Acknowledgements
The authors would like to thank Ana Susac for her painstaking, persistent, and detailed
checking of the text and every example in the book. Her assistance has been invaluable.
Any mistakes remaining are entirely our own responsibility.
A FT
DR
ii
Contents
Acknowledgements ii
FT
1.1.2 Software Testing and Risk Management
1.2 Mistakes, Faults and Failures . . . . . . . . . .
1.2.1 Mistakes . . . . . . . . . . . . . . . . . .
1.2.2 Software Faults . . . . . . . . . . . . . .
1.2.3 Software Failures . . . . . . . . . . . . .
1.2.4 Need for Testing . . . . . . . . . . . . .
1.3 The Role of Specifications . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
4
5
5
6
7
9
1.4 Manual Test Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
A
1.5 The Theory of Software Testing . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Exhaustive Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.1 Exhaustive Test Data . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.2 Feasibility of Exhaustive Testing . . . . . . . . . . . . . . . . . . . 11
DR
iii
CONTENTS iv
2 Equivalence Partitions 25
2.1 Testing with Equivalence Partitions . . . . . . . . . . . . . . . . . . . . . 25
2.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.1 Analysis: Identifying the Equivalence Partitions . . . . . . . . . . 26
2.2.2 Test Coverage Items . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.3 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.4 Verification of the Test Cases . . . . . . . . . . . . . . . . . . . . . 33
2.3 Test Implementation and Results . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.1 Manual Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
FT
2.3.2 Automated Test Implementation . . . . . . . . .
2.3.3 Test Results . . . . . . . . . . . . . . . . . . . . .
2.4 Equivalence Partitions in More Detail . . . . . . . . . .
2.4.1 Fault Model . . . . . . . . . . . . . . . . . . . . .
2.4.2 Description . . . . . . . . . . . . . . . . . . . . .
2.4.3 Analysis: Identifying Equivalence Partitions . . .
2.4.4 Test Coverage Items . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
38
38
38
38
38
40
2.4.5 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
A
2.4.6 Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.2 Strengths and Weaknesses . . . . . . . . . . . . . . . . . . . . . . . 45
DR
3.5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5.2 Strengths and Weaknesses . . . . . . . . . . . . . . . . . . . . . . . 56
3.6 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7 Notes for Experienced Testers . . . . . . . . . . . . . . . . . . . . . . . . 57
FT
4.4.1 Fault Model . . . . . . . . . . . . . . . . . . . . .
4.4.2 Description . . . . . . . . . . . . . . . . . . . . .
4.4.3 Analysis: Developing Decision Tables . . . . . .
4.4.4 Test Coverage Items . . . . . . . . . . . . . . . .
4.4.5 Test Cases . . . . . . . . . . . . . . . . . . . . . .
4.4.6 Pitfalls . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
70
70
70
80
81
81
81
4.5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A
4.5.2 Strengths and Weaknesses . . . . . . . . . . . . . . . . . . . . . . . 84
4.6 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.7 Notes for Experienced Testers . . . . . . . . . . . . . . . . . . . . . . . . 84
DR
5 Statement Coverage 85
5.1 Introduction to White-Box Testing . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Testing with Statement Coverage . . . . . . . . . . . . . . . . . . . . . . . 85
5.2.1 Statement Coverage Measurement . . . . . . . . . . . . . . . . . . 86
5.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3.1 Analysis: Identifying Unexecuted Statements . . . . . . . . . . . . 87
5.3.2 Test Coverage Items . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.3.3 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.3.4 Verification of the Test Cases . . . . . . . . . . . . . . . . . . . . . 90
5.4 Test Implementation and Results . . . . . . . . . . . . . . . . . . . . . . . 91
5.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.4.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5 Statement Coverage Testing in More Detail . . . . . . . . . . . . . . . . . 93
5.5.1 Fault Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5.3 Analysis: Identifying Unexecuted Statements . . . . . . . . . . . . 93
5.5.4 Test Coverage Items . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5.5 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.6.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.6.2 Strengths and Weaknesses . . . . . . . . . . . . . . . . . . . . . . . 96
CONTENTS vi
6 Branch Coverage 98
6.1 Testing with Branch Coverage . . . . . . . . . . . . . . . . . . . . . . . . 98
6.1.1 Branch Coverage Measurement . . . . . . . . . . . . . . . . . . . . 98
6.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2.1 Analysis: Identifying Untaken Branches . . . . . . . . . . . . . . . 99
6.2.2 Test Coverage Items . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.3 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.4 Verification of the Test Cases . . . . . . . . . . . . . . . . . . . . . 102
6.3 Test Implementation and Results . . . . . . . . . . . . . . . . . . . . . . . 103
6.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.4 Branch Coverage in More Detail . . . . . . . . . . . . . . . . . . . . . . . 106
6.4.1 Fault Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
FT
6.4.2 Description . . . . . . . . . . . . . . . .
6.4.3 Goal . . . . . . . . . . . . . . . . . . . .
6.4.4 Analysis: Identifying Untaken Branches
6.4.5 Test Coverage Item . . . . . . . . . . . .
6.4.6 Test Cases . . . . . . . . . . . . . . . . .
6.5 Evaluation . . . . . . . . . . . . . . . . . . . .
6.5.1 Limitations . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
107
107
108
108
109
109
109
6.5.2 Strengths and Weaknesses . . . . . . . . . . . . . . . . . . . . . . . 111
A
6.6 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.7 Notes for Experienced Testers . . . . . . . . . . . . . . . . . . . . . . . . 111
FT
8.2.8 Floating Point Numbers . . . . . . . . . . . . . .
8.2.9 Numeric Processing . . . . . . . . . . . . . . . .
8.3 White-Box Testing: Some More Techniques . . . . . . .
8.3.1 Dataflow Coverage / Definition-Use Pairs . . . .
8.3.2 Condition Coverage . . . . . . . . . . . . . . . .
8.3.3 Decision Coverage . . . . . . . . . . . . . . . . .
8.3.4 Decision Condition Coverage . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
154
155
156
156
157
158
158
8.3.5 Multiple Condition Coverage . . . . . . . . . . . . . . . . . . . . . 158
A
8.3.6 Modified Condition/Decision Coverage . . . . . . . . . . . . . . . . 159
8.3.7 Test Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.4 Repair-Based Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8.4.1 Specific Repair Test . . . . . . . . . . . . . . . . . . . . . . . . . . 161
DR
10 Application Testing
FT
Key Points . . . . . . . . . . . . . . . . .
Notes for Experienced Testers . . . . . .
.
.
.
.
.
.
.
.
10.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
190
190
191
191
191
192
10.2.2 Test Coverage Items . . . . . . . . . . . . . . . . . . . . . . . . . . 197
A
10.2.3 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
10.2.4 Verification of the Test Cases . . . . . . . . . . . . . . . . . . . . . 200
10.3 Test Implementation and Results . . . . . . . . . . . . . . . . . . . . . . . 201
10.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
DR
FT
11.9.3 Inheritance Test Selection . . . . . . . . . . . .
11.10 Interfacing to Web Applications . . . . . . . . . . . .
11.11 Interfacing to Desktop Applications . . . . . . . . . .
11.12 Interfacing to Mobile Applications . . . . . . . . . . .
12 Random Testing
12.1 Introduction to Random Testing . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
252
254
256
256
257
257
12.2 Random Data Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
A
12.3 Unit Test Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
12.3.1 Test Coverage Items . . . . . . . . . . . . . . . . . . . . . . . . . . 258
12.3.2 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
12.3.3 Test Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 260
DR
14 Wrapup 284
14.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
14.2 A Reverse Look at Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 285
14.2.1 A Test Implementation . . . . . . . . . . . . . . . . . . . . . . . . 286
14.2.2 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
14.2.3 Test Coverage Items . . . . . . . . . . . . . . . . . . . . . . . . . . 286
14.2.4 Test Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
14.2.5 The Software Specification . . . . . . . . . . . . . . . . . . . . . . 288
14.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
14.3 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
FT
14.3.1 Testing in the Software Process . . . .
14.3.2 Standards on Software Testing . . . .
14.3.3 Software Testing Techniques . . . . . .
14.3.4 Testing Object-Oriented Software . . .
14.3.5 Integration Testing . . . . . . . . . . .
14.3.6 Random Testing . . . . . . . . . . . .
14.3.7 Testing for Language-Specific Hazards
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
289
289
289
290
290
290
291
14.3.8 Program Proving . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
A
14.3.9 Testing Safety-Critical Software . . . . . . . . . . . . . . . . . . . . 291
14.3.10 Going Off-Piste . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
14.4 Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
14.4.1 Conferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
DR
Bibliography 295
List of Figures
2.1
2.2
2.3
2.4
2.5
FT
Test Artefacts . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
27
27
27
28
28
2.6 Second Specification-Based Range for bonusPoints . . . . . . . . . . . . . 28
2.7 Start of the Third Range . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8 Third Specification-Based Range for bonusPoints . . . . . . . . . . . . . . 28
2.9 Fourth Specification-Based Range for bonusPoints . . . . . . . . . . . . . 29
2.10 Specification-Based Ranges for goldCustomer . . . . . . . . . . . . . . . . 29
DR
xi
LIST OF FIGURES xii
7.1
7.2
7.3
7.4
7.5
FT
Manual Demonstration of Fault 6 . . . . . . . . . . . . . . .
CFG Stage 0 . . . . . . . . . . . . . . . . . . . .
CFG Stage 1 . . . . . . . . . . . . . . . . . . . .
CFG Stage 2 . . . . . . . . . . . . . . . . . . . .
CFG Stage 3 . . . . . . . . . . . . . . . . . . . .
CFG Stage 4 . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
111
115
115
115
116
116
7.6 CFG for giveDiscount() with Fault 6 . . . . . . . . . . . . . . . . . . . . . 116
A
7.7 End-to-end Paths Through the CFG . . . . . . . . . . . . . . . . . . . . . 117
7.8 AP Test Results for giveDiscount() with Fault6 . . . . . . . . . . . . . . . 121
7.9 AP Test Results for giveDiscount() . . . . . . . . . . . . . . . . . . . . . . 122
7.10 AP Test Coverage Summary for giveDiscount() . . . . . . . . . . . . . . . 122
DR
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
188
189
193
193
194
194
10.5 Inspector – litres input textbox . . . . . . . . . . . . . . . . . . . . . . . . 195
A
10.6 Inspector – Enter button . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.7 Inspector – body element . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.8 T1 Test Results for Fuel Checker . . . . . . . . . . . . . . . . . . . . . . . 203
10.9 Fuelchecker User Story Test Results . . . . . . . . . . . . . . . . . . . . . 206
DR
13.1
13.2
13.3
13.4
13.5
13.6
A FT
The Stages of Incremental Testing . . . . .
A Tester’s view of the Waterfall Model . . .
A Tester’s view of the V-Model . . . . . . .
A Tester’s view of Incremental Development
Testing in eXtreme Programming . . . . . .
Testing in the Scrum Process . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
275
277
278
280
281
282
DR
List of Tables
xv
LIST OF TABLES xvi
6.1
6.2
6.3
6.4
FT
Extra SC Test Coverage Items for giveDiscount()
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
90
102
102
102
103
A FT
DR
Listings
xviii
LISTINGS xix
FT
11.12Source Code for Class Shape . . . . . . . . . .
11.13Source Code for Class Circle . . . . . . . . . . .
11.14Shape Test – by Name . . . . . . . . . . . . . .
11.15Inheritable Shape Test . . . . . . . . . . . . . .
11.16CircleTest Inherits ShapeTest . . . . . . . . . .
11.17ShapeTest with Dependencies . . . . . . . . . .
11.18Shape Test with Groups . . . . . . . . . . . . .
A .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
247
247
248
249
250
252
252
11.19Circle Test with Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
12.1 Random EP Test for giveDiscount() . . . . . . . . . . . . . . . . . . . . . 260
12.2 Setup for FuelCheckRandomTest . . . . . . . . . . . . . . . . . . . . . . . 263
12.3 Stored Test Data Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
DR
Preface
Modern society is heavily reliant on software, and the correct operation of this software
is a critical concern. The purpose of this book is to introduce the reader to the essential
principles of software testing, enabling them to produce high quality software. Software
testing can be regarded as an art, a craft, and a science – the approach we present
provides a bridge between these different viewpoints.
FT
This book is based on many years of lecturing in software engineering and software
testing at undergraduate and postgraduate level, as well as industrial experience. Soft-
ware testing techniques are introduced through worked examples, leading to automated
tests. Each technique is then explained in more detail, and then its limitations are
demonstrated by inserting faults. The process of applying the techniques is also
emphasised, covering the steps of analysis, test design, test implementation, and
interpretation of test results.
A
The worked examples offer the beginner a practical, step-by-step introduction to each
technique. The additional details complement these, providing a deeper understanding
of the underlying principles. We hope that you will enjoy reading the book as much as
we enjoyed writing it.
DR
xx
For CS265/CS608 Students Personal Use Only
Chapter 1
Introduction to Software
Testing
FT
This chapter discusses the motivations for testing software, and also discusses why
exhaustive testing is not generally feasible, and thus various test heuristics must be used.
These test heuristics, and the lack of a standard software specification1 language, are
what makes software testing as much an art as a science.
would sell software and hardware as separate products. This opened up the market to
external companies that could produce and sell software for IBM machines.
Software products for mass consumption arrived in 1981 with the arrival of PC-
based software packages. Another dramatic boost came in the 1990’s with the arrival
of the World Wide Web, and in the 2000’s with mobile devices. In 2010 the Top 500
companies in the global software industry had revenues of $492 billion, and by 2018,
this had risen to $868 billion2 . The industry is extremely dynamic and continually
undergoing rapid change as new innovations appear. Unlike some other industries, for
example transportation, it is still in many ways an immature industry. It does not, in
general, have a set of quality standards that have been gained through years of hard-won
experience.
Numerous examples exist of the results of failures in software quality and the costs
it can incur. Well publicised incidents include the failure of the European Space
Agency’s Ariane 5 rocket, the Therac-25 radiation therapy machine, and the loss of
the Mars Climate Orbiter in 1999. A study by the US Department of Commerce’s
National Institute of Standards and Technology in 2002 estimated that the annual cost
of inadequate software testing to the US economy was up to $59.5 billion per year3 .
1A software specification defines clearly and unambiguously what the software must do.
2 Software Magazine 500 Companies
3 Economic Impacts of Inadequate Infrastructure for Software Testing
1
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 2
However, many participants in the industry do apply quality models and measures
to the processes through which their software is produced. Software Testing is an
important part of the Software Quality assurance process, and is an important discipline
within Software Engineering. It has an important role to play throughout the software
development lifecycle, whether being used in a verification and validation context, or as
part of an test-driven software development process such as eXtreme Programming.
Software Engineering as a discipline grew out of the Software Crisis. This term was
first used at the end of the 1960’s, but it really began to have meaning through the 1970’s
as the software industry was growing. This reflected the increasing size and complexity
of software projects combined with the lack of formal procedures for managing such
projects.
This resulted in a number of problems:
• Projects were running over-budget.
• Projects were running over-time.
• The Software products were of low quality.
•
•
• FT
The Software products often did not meet their requirements.
Projects were chaotic.
Software maintenance was increasingly difficult.
If the software industry was to keep growing, and the use of software was to become
more widespread, this situation could not continue. The solution was to be in formalising
the roles and responsibilities of software engineering personnel. These software engineers
would plan and document in detail the goals of each software project and how it was
A
to be carried out, they would manage the process via which the software code would be
created, and they would ensure that the end result had attributes to show that it was a
quality product. This relationship between quality management and software engineering
meant that software testing would be integrated into its field of influence. Moreover, the
DR
field of software testing was also going to have to change if the industry wanted to get
over the Software Crisis.
While the difference between debugging a program and testing a program was
recognized by the 1970’s, it was only from this time on that testing began to take a
significant role in the production of software. It was to change from being an activity
that happened at the end of the product cycle, to check that the product worked, to an
activity that takes place throughout each stage of development, catching faults as early
as possible. A number of studies comparing the relative costs of early and late defect
detection have all reached the same conclusion: the earlier the defect is caught, the less
the cost of fixing it.
The progressive improvement of software engineering practices has led to a significant
improvement in software quality. The short-term benefits of software testing to the
business include improving the performance, interoperability and conformance of the
software products produced. In the longer term, testing reduces the future costs, and
builds customer confidence.
Software testing is integrated into many of the software development processes in use
today. Approaches such as Test Driven Development (or TDD) use testing to drive the
code development. In this approach, the tests are developed (often with the assistance
of the end-user or customer) before the code is written.
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 3
Attribute
Functional suit-
ability
Performance ef-
ficiency
A Characteristics
Appropriateness
FT
Table 1.1: Software Quality Attributes in ISO 25010
4 ISO/IEC 25010:2011
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 4
For a business, there are short and long term costs associated with failure. The
short term costs are primarily related to fixing the problem, but may also be from lost
revenue if product release is delayed. The long term costs are primarily the costs of losing
reputation and associated sales.
The cost of testing needs to be in proportion to both income and the cost of failure
(with the current state of the art, it is arguable that all software is subject to failure
at some stage). The effectiveness of testing can generally be increased by the testing
team being involved early in the process. Direct involvement of customers/users is also
an effective strategy. The expected cost of failure is controlled through reducing the
probability of failure through rigorous engineering development practices, and quality-
assurance (testing is part of the quality assurance process).
Software testing can be addressed as an optimisation process: getting the best return
for the investment. Increased investment in testing reduces the cost of software failures,
but increases the cost of software development. The key is to find the best balance
between these costs. This interaction between cost of testing and profit is demonstrated
in Figure 1.1.
A FT
DR
1. Mistakes: these are made by software developers. These are conceptual errors
and can result in one or more faults in the source code.
2. Faults: these are flaws in the source code, and can be the product of one or more
mistakes. Faults can lead to failures during program execution.
3. Failures: these are symptoms of a fault, and consist of incorrect, or out-of-
specification, behaviour by the software. Faults may remain hidden until a certain
set of conditions are met which reveal them as a failure in the software execution.
When a failure is detected by software, it is often indicated by an error code.
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 5
1.2.1 Mistakes
Mistakes can be made in a number of different ways. For example:
FT
2. When developing software tests: as a guide to likely faults.
3. When undergoing software process evaluation or improvement: as input data.
There are a number of different ways to categorise these software faults – but no
single, accepted standard exists. The significance of faults can vary depending on the
circumstances. One representative categorisation5 identifies the following ten fault types:
Algorithmic A unit of the software does not produce an output corresponding to the
A
given input under the designated algorithm.
Syntax Source code is not in conformance with the programming language specification.
Computation and Precision The calculated result using the chosen formula does not
conform to the expected accuracy or precision.
DR
CHAPTER 1. INTRODUCTION 6
It is very difficult to find industry figures relating to software faults, but one example
is a study by Hewlett Packard7 on the frequency of occurrence of various fault types,
which found that 50% of the faults analysed were either Algorithmic or Computation and
Precision.
Severity Level
1 (most severe)
2
Behaviour
FT
Failure causes a system crash and the recovery time
is extensive; or failure causes a loss of function and
data and there is no workaround
Failure causes a loss of function or data but there
is manual workaround to temporarily accomplish the
tasks
3 Failure causes a partial loss of function or data where
A
the user can accomplish most of the tasks with a small
amount of workaround
4 (least severe) Failure causes cosmetic and minor inconveniences
where all the user tasks can still be accomplished
DR
Hardware failures show a typical bathtub curve, where there is a high failure rate
initially, followed by a period of relatively low failures, but eventually the failure rate
rises again. The early failures are caused by manufacturing issues, and handling and
installation errors. As these are ironed out, during the main operational life of a product,
the failure rate stays low. Eventually, however, hardware ages and wears out, and the
failure rates rise again. Most consumer products follow this curve.
Software failures demonstrate a similar pattern, but for different reasons as software
does not physically wear out. A typical curve for the failure rate of software is shown in
Figure 1.2.
CHAPTER 1. INTRODUCTION 7
Figure 1.2: Failure Rate for a Software Product over its Lifecycle.
FT
During initial development, the failure rate is likely to fall rapidly during the test
and debug cycle. After the first version is released, during the operational lifetime of
the software, there may be periodic upgrades which tend to introduce new failures,
exposing latent faults in the software or introducing new faults. These upgrades may,
for example, include additional features, changes required for integration with other
software, or modifications required to support changes in the execution environment (such
A
as operating systems updates). Subsequent maintenance activity progressively lowers the
failure rate again, reflecting an overall increase in code quality. Finally the software is
retired, when it is no longer actively developed; this is particularly relevant to open
source software, where the original developer may stop maintaining their contribution.
DR
with no security upgrades. A progressive rise in the failure rate of these systems can be expected until
they are replaced with Windows 10.
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 8
A FT
Figure 1.3: Ideal Project Progression using Forward Engineering.
In practice, however, all the development steps are subject to mistakes and
ambiguities, leading to less-than-ideal results. To resolve this, each of the steps must
be subject to a check to ensure that it has been carried out properly, and to provide an
DR
opportunity to fix any mistakes before proceeding. The verification and fixing following
unreliable development at each step is shown in Figure 1.4.
For software products, in addition to checking that each individual step has been
done correctly, it has been found in practice that a second form of checking is necessary:
making sure that the implementation meets the users needs. The first form is referred
to as verification, the second as validation.
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 9
should be notified to the caller. In summary, in order to test software properly, detailed
specifications are necessary11 .
Note that often a tester will need to convert a specification from a written English or
natural language form to one that is more concise, and easier to create tests from.
• FULLPRICE which indicates that the customer should be charged the full price.
• DISCOUNT which indicates that the customer should be given a discount.
10 Note the difference between software requirements and user requirements. Software requirements
state what the software must do. User requirements state what the user wants to able to do.
11 One exception is in stability testing, where tests ensure that the software does not crash.
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 10
FT
Figure 1.5: Manual Test Example
These results look correct: a regular customer with 100 points is charged the full
price, a gold customer with 100 points is given a discount, and -10 is an invalid number
of points.
So does this method work correctly? Does it work for every possible input? Next we
will consider the theory of software testing to try and answer these questions.
A
1.5 The Theory of Software Testing
DR
A test consists of executing the software under test with the selected input values,
gathering the output values, and verifying that these output values are correct (i.e. are
as specified).
The goal of developing a theory of software testing is to be able to identify the ideal
tests – that is the minimum set of test data required to ensure that software works
correctly for all inputs. The fundamental result in this area13 can be summarised as
follows:
• For a test to be successful, all test data within the test set must produce the results
as defined by the specification.
• The test data selection criteria is reliable if it consistently produces test sets which
are successful, or it consistently produces test sets which are not successful.
• If a set of test data is chosen using a criterion that is reliable and valid, then the
successful execution of that test data implies that the program will produce correct
results over its entire input domain.
• Leading to the result: the only criterion that can be guaranteed to be reliable
and valid is one that selects each and every value in the program domain (i.e.
exhaustively tests the program).
CHAPTER 1. INTRODUCTION 11
FT
There are two key problems with this approach:
• Test execution time: executing this number of tests would take too long. For
example, on a PC that can execute 60,000 tests per second, it would take
approximately 600,000,000,000,000 seconds, or 20 million years to test this simple
program.
• Test design time: just executing the code with all these different inputs is not
enough to test the software. The test has to ensure that every answer is correct.
A
So for each of these 265 sets of input values, the correct output value would have
to be worked out so if can be verified in each test.
Implications
DR
Considering other examples, a class might easily hold 64 bytes of data (class attributes).
64 bytes represents 2512 possible states. Leading to 2512 tests. A database might hold
1 million records of 128 bytes each. This represents approximately 21,000,000,000 possible
values, leading to the same number of tests. Testing with large numbers of combinations
is intractable.
In addition, identifying the correct output for each test is a more difficult problem
than it sounds. Doing this manually takes too long. And writing a program to do this
introduces a new problem – the Test Oracle problem. The requirements for a test oracle
are the same as the requirements for the software being tested – so how do you make
sure the test oracle is correct?
CHAPTER 1. INTRODUCTION 12
as much of the code as possible. Satisfying all these criteria is difficult in practice and a
subject of much research.
The fundamental basis of all testing techniques is selecting a subset of the possible
input parameters. Therefore, the quality of a test technique is determined by how
effective this subset is in finding faults. Unfortunately, there is no well established
theory of faults, so test techniques tend to be heuristic techniques, based on fundamental
principles and practice.
9 bonusPoints = r.nextLong();
10 goldCustomer = r.nextBoolean();
11 result = Check.check( bonusPoints, goldCustomer );
12 System.out.println("check("+bonusPoints+","+goldCustomer + ")->"
DR
+ result);
13 }
14
15 }
Executing this program with loops=1,000,000 results in check() being called a million
times with different random inputs, and generates a large amount of output. A short
extract from the output is shown below in Figure 1.6. This example took about 7 seconds
to execute, mainly due to writing the output to the screen – without the output, the
execution time is under 100 milliseconds.
check(8110705872474193638,false)->DISCOUNT
check(-4436660954977327983,false)->ERROR
check(973191624493734389,false)->DISCOUNT
check(4976473102553651620,false)->DISCOUNT
check(-7826939567468457993,true)->ERROR
check(6696033593436804098,false)->DISCOUNT
check(8596739968809683320,true)->DISCOUNT
check(-8776587461410247754,true)->ERROR
check(-947669412182307441,true)->ERROR
check(5809077772982622546,true)->DISCOUNT
CHAPTER 1. INTRODUCTION 13
This code exercises the program, and demonstrates that it does not crash, but it does
not test the program: the output result is not checked each time. In addition, half the
random values for bonusPoints are less than zero and create an ERROR, and most of
the other inputs create the output DISCOUNT. The probability of getting FULLPRICE
as an output is approximately 1 in 100,000,000 – this output does not even appear once
in the sample shown above.
This example illustrates the three key problems with random testing:
1. The Test Oracle problem – how to tell if the result is correct. Doing this manually
would be very tedious and take too long. And writing a program to do it does not
make sense – how would it be tested? In fact, we have already written a program
to do this, and are trying to test it!
2. The Test Data problem – this code is generating random values, not random data.
This results typically in both generating too many error input values and missing
some required input values.
3. The Test Completion problem – in this example the random tests run for 1,000,000
FT
iterations, but there is no real basis for this number. Perhaps 1000 tests would
have provided adequate assurance of correctness; or perhaps 1,000,000,000 would
be required.
Note that these problems exist with all forms of testing – but in random testing their
solutions are invariably automated, which requires further analysis. Later in the book
some solutions to these problems will be explored.
A
1.7.2 Black-box and White-box Testing
These are two general categories of heuristic for selecting a subset of the input parameters.
One is to generate input values based on exercising the specification. This is referred to
DR
as black-box testing (or specification-based testing). The other is to generate input values
based on exercising the implementation (most techniques use the source code). This is
referred to as white-box testing (or structure-based testing).
Exercising the specification means to have generated enough tests to ensure that every
different type of processing, contained in the specification, has been performed. This not
only means that every different category of output has been generated, but also that
every different category of output has been generated for every different input cause.
Exercising the implementation means to have generated enough tests so that every
component that makes up the implementation has been demonstrated to produce a valid
result when executed. At its simplest, each line of source code is regarded as a separate
component. But, as you will see in following chapters, there are more sophisticated
ways of identifying components that provide for fuller coverage of a program’s possible
behaviours.
In all tests, just exercising the code or the specification is not enough. A test must
not only do this, it must also verify that actual result matches the expected results. The
test result is a pass if and only if this verification succeeds.
This book introduces a number of essential techniques for selecting test data and
demonstrates their use in unit testing (testing a single method), object-oriented testing
(testing an entire class), and application testing (testing an entire application). Once
the reader has developed an understanding of these techniques, they can then consider
applying the following techniques: experience-based testing and fault insertion.
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 14
FT
software or input data, and checking that they are found. This has its foundations in
hardware testing, where typically a connection will be artificially held at a low or high
voltage representing a stuck at 1 or stuck at 0 condition. The hardware is then run,
typically through simulation, to ensure that the fault is detected and handled correctly
by the hardware. The fault represents a data fault – all data that is transmitted down
that connection will be subject to the fault.
When used in software testing, faults can be inserted either into the code, or into
known-good data. When inserted into data, the purpose is similar as for hardware
A
testing – to ensure that the fault is detected and handled correctly by the software. When
inserted into code, the purpose is different: it can be used to measure the effectiveness
of the testing, or to demonstrate that a particular fault is not present. The terms weak
and strong mutation testing are used here: weak mutation testing is when the fault is
DR
manually evaluated through review, and strong mutation testing is when the fault is
automatically evaluated by running existing tests to see if they can find it. See further
reading in Section 14.3.
• From a budgetary point of view, when the time or budget allocated for testing has
expired.
• From an activity point of view, when the software has passed all of the planned
tests.
• From a risk management viewpoint, when the predicted failure rate meets the
quality criteria.
Ultimately, the decision has to take all these viewpoints into account – it is rare for
software to pass all of its tests prior to release. Two techniques to assist in making
this decision are the Risk of Release and Usage-based Criteria. During testing, Usage-
based Criteria give priority to the most frequent sequences of program events. The
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 15
Risk of Release predicts the cost of future failures, based on the chance of these failures
occurring, and the estimated cost of each.
Ethical and legal issues may be involved in making this decision, especially where
the software is a part of mission-critical or safety-critical systems14 . Appropriate criteria
for software test completion are also an important factor if a software failure results in
litigation15 .
checklist is used and a formal report produced at the end of the meeting, which is the
basis of remedial work to the code. This is a typical and highly effective method in
software engineering, especially for high quality requirements. Automated code analysis
tools can be also used: these might find potential security flaws, to identify common
mistakes in the code, or to confirm that the software conforms to a particular coding
standard.
14 Safety-critical or life-critical systems may pose a threat to health or life if they fail. Mission-critical
systems may pose a threat to a mission if they fail. Examples include software for space exploration,
the aerospace industry, and medical devices.
15 The topic is explored is more detail in Information Technology Law in Ireland (Kelleher and
Murray).
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 16
is then run through a program prover to prove that the code matches its requirements
(or to find counter examples that demonstrate where it fails to do so). The programmer
will also be required to produce internal specifications in the code, and one of the most
challenging of these are loop invariants. This requires less skill and experience than
manual proofs, but is significantly more challenging than dynamic testing.
The benefits of a proof are very significant: a program to be proven to work in all
circumstances. The low-level program proof can also be incorporated with higher level
design proofs (such as model checking), providing an integrated proof framework for a
full computer-based system.
There are a number of languages and tools for program proving in development at the
moment – however, it is arguable whether any of them are quite ready for industrial use.
For example, JML16 provides support for Java programs. See Section 14.3 for suggested
further reading on this topic.
FT
Having discussed why software needs testing, and summarised how testing might be done,
it is useful to consider where testing fits into the software development process.
Software of any degree of complexity has three key characteristics:
1. User requirements, stating the needs of the software users.
2. A functional specification, stating what the software must do to meet those needs.
3. A number of modules, that are integrated together to form the final system.
A
Verifying each of these leads to different levels of testing:
Unit Testing An individual unit of software is tested to make sure it works correctly.
This unit may be a single component, or a compound component formed from
DR
Regression Testing When a new unit has been added, or an existing unit modified, the
tests for the previous version of the software are executed to ensure that nothing
has been broken. This may be done for individual units, a sub-system, or for the
full software system.
Integration Testing Two or more units (or subsystems) are tested to make sure they
interoperate correctly. The testing may make use of the programming interface
(typically for simple components), or possibly the system interface (for sub-
systems). Test data is selected to exercise the interactions between the integrated
units.
Subsystem Testing Where a large system is constructed from multiple subsystems,
each subsystem may be tested individually to make sure it works correctly. The
testing uses the subsystem interface: this may be a Graphical User Interface (GUI)
16 http://www.openjml.org
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 17
for an application, web interface for a web-server, network interface for a network
server, etc. Test data is selected to ensure the subsystem satisfies its specification.
System Testing The entire software system is tested as a whole to make sure it works
correctly17 . The testing uses the system interface: this may be a Graphical User
Interface for an application (GUI), web interface for a web-server, network interface
for a network server, etc. Test data is selected to ensure the system satisfies the
specification. Desktop, mobile, and web-based application testing are all forms of
system testing which are tested via the user interface.
Acceptance Testing The entire software system is tested as a whole to make sure
it meets the user’s needs, or solves the user’s problem, or passes a set of tests
developed by the user. This is frequently used before payment when software is
developed under contract.
Each of these test activities can use black-box or white-box techniques to develop
tests, however white-box techniques are seldom used except in unit testing.
Chapter 14.
1.11
A
Software Testing Activities
FT
The focus in this book is on dynamic software verification, with a particular emphasis
on using the essential test design techniques for unit testing (including object-oriented
testing) and application testing. Recommendations for further reading are provided in
Independent of the type of software testing being performed, there are a number of
activities that need to be undertaken. This book uses the following 7-step approach18 :
The worked examples throughout the book demonstrate the application of these
activities. In practice, an experienced tester may perform many of the steps mentally
without documenting the results. Software in mission-critical or safety-critical systems
requires a high level of quality, and the test process for these would usually generate
17 A simple form of system testing is called Smoke Testing, coming from hardware development where
the first step in testing is to turn it on and see if it produces any smoke.
18 Many software processes may define their own specific set of activities.
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 18
1.11.1 Analysis
All test design requires some form of analysis to be undertaken. This may either be
analysis of the source code, or analysis of the specification – including both the user
requirements and the software specifications. These sources of test information are
referred to as the Test Basis. The results of the analysis are sometimes referred to
as Test Conditions, and are used to determine test coverage items. In practice, the
analysis results may not be fully documented, if at all – but there is value in doing so,
as it allows the test coverage items to be reviewed more accurately for completeness.
selected19 .
Some examples of test coverage items are:
•
•
A
A
particular
particular
FT
Test coverage items are particular items to be covered by a test. They are generated
by reviewing the results of the analysis using the test design technique that has been
each. This is for the simple reason of reducing the time it takes to execute the tests. For
a small number of tests this is not important. But large software systems may have tens
of thousands of tests, and the execution time may run into multiple days.
Any values used in defining a test coverage item should be as generic as possible,
referencing constants by name, and stating relationships rather than actual values where
possible. This allows maximum flexibility in designing the test cases, allowing a single
test case to cover as many coverage items as possible.
Each test coverage item requires an identifier, so that it can be referenced when the
test design is being verified. This identifier must be unique for each test item. To achieve
this, as the test coverage items are specific to each test technique, this book uses a prefix
based on the test technique being used (e.g. EP1 as an identifier for test coverage item
1, using Equivalence Partitions as shown in Chapter 2).
In simple testing, all the inputs are passed as arguments to the code, and the output is
returned as a value. In practice, it is not unusual for there to be both explicit arguments
(passed in the call to the code), and implicit inputs (for example, in Java, class attributes
read by the code; or, in C, global variables). Also there may both be an explicit return
value, and other implicit outputs (such as, in Java, class attributes modified by the code).
These should all be included in the test coverage items.
19 Not every text differentiates clearly between what is being tested for (the test coverage items),
what values needed to perform the test (the test data), and the tests themselves test cases. However,
the distinction is important, and this book clearly delineates the different terms.
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 19
Error Hiding
Test coverage items are usually divided into two categories: normal and error cases. It is
important to separate these as multiple input normal coverage items may be incorporated
into the same test case: multiple input error coverage items may not. This is because
of error hiding. Most software does not clearly differentiate between all the possible
error causes: therefore in order to test that each input error is identified and handled
correctly, only one such error may be tested at a time. In this book, an asterisk (*) is
used to signify input error test coverage items and error test cases.
A simple example is shown in Snippet20 1.1.
This ensures that each error value is tested individually, so that an earlier fault does
not hide a subsequent fault.
CHAPTER 1. INTRODUCTION 20
• A unique identifier for the test case (e.g. T1 for Test Case 1).
• A list of the test coverage items covered by each test case.
• The test data consisting of:
– Input Values – these should be specific values.
– Expected Results – these are always derived from the specification, never from
the source code.
The test case specifications define the tests to be implemented.
It is crucial that every test case for a particular test item has a unique identifier21 . A
particular test case may cover test coverage items derived using different techniques, so
it is not useful to include the technique name as a prefix to the test case identifiers. One
possible approach, used in this book, is to number the tests for a test item in sequence
as they are designed.
The test case specification may include additional, related, information. For example,
in Object-Oriented Testing, setting input values may require calling a number of different
methods, and so the names of the methods and the order in which they are to be called
must also be specified.
FT
The list of test coverage items covered by each test case should be used as the guide
for determining when enough tests have been created. Each test coverage item must be
covered if possible (sometimes impossible test coverage may be created, which must be
clearly identified).
both the test coverage items that are covered by each test case, and the test cases that
cover each test coverage item. This allows the test design to be easily reviewed, ensuring
that every test coverage item is efficiently covered.
Formal reviews can be used to ensure the quality of the test designs. For a test review
to be effective, the reviewer needs to see the outputs of the previous steps: analysis, test
coverage items, and test cases. However, in practice, much of this work is frequently
done mentally and not documented in detail.
CHAPTER 1. INTRODUCTION 21
matching test case identifier. Each specified test case should be implemented separately
– this allows individual failures to be clearly identified. Tests may be grouped into test
suites (or test sets) which allow a tester to select a large number of tests to be run as a
block.
FT
Figure 1.7: Test Artefacts
In this book, test cases are specified using the template shown in Table 1.3. There is no
standard for this format – but it is important to be consistent to allow test specifications
to be reviewed easily. The suggested layout is as follows:
• The ID column provides a unique Test Case ID for each test (this needs to be
unique to the test item).
• The TCI column documents the test coverage items that this test case covers. It
is recommended to identify the new test coverage items covered by this test case
separately from those already covered by previously defined test cases.
• The Inputs gives the required input data values for each parameter.
• The Expected Results documents the output values which are expected based on
the specification of the software being tested. This may be abbreviated as Exp.
Results in tables in this book. For Java, this is usually the return value from the
method called.
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 22
It is a key goal in developing tests to avoid duplication. The experienced tester may
be able to do this while developing data for the test cases, but it is recommended (at
least initially) to document the required data for each test case as it is designed, and
then subsequently identify and discard test cases with duplicate data.
FT
As exhaustive testing is not feasible, every test approach is a compromise, designed to
find particular classes of faults. These are referred to as fault models. Each type of
testing is designed around a particular fault model. The initial types shown in the book
have simple fault models, and the later ones address progressively more complex fault
models. An understanding of the classes of faults introduced by the software developers
into the code can lead to a more effective emphasis on those tests most likely to find the
faults.
A
1.14 Using This Book
This book is intended for an introductory audience, and is used on a number of
DR
• The introduction describes the motivation for testing, discusses exhaustive and
random testing, and describes the basic steps required to test software.
• Six fundamental test techniques, covering black-box and white-box testing, are
introduced by examples of unit testing. Each is fully worked from analysis of
the problem through to automated test implementation. Following the worked
example, the principles of each technique are then addressed in more detail.
• The fundamental elements of testing of object-orientated software are introduced.
This is a very substantial topic, and this book addresses some of the underlying
principles, and provides a number of examples.
For CS265/CS608 Students Personal Use Only
CHAPTER 1. INTRODUCTION 23
In practice, most testers will not document their tests to the degree of detail shown in
this book, except in situations where high software quality is required. However, the
tester must do the same mental work to develop tests systematically, and while learning
A
to test it is a good practice to write down all the intermediate results as shown in this
book.
DR
1.14.6 Examples
A number of examples are presented, both of software to be tested, and of automated
tests. These examples are intended to show how testing principles are used, and should
not be taken as an example of good coding practice. In many cases comments and
Javadoc specifications have been omitted in the interests of space.
CHAPTER 1. INTRODUCTION 24
FT
with the terminology used in the ISO International Standard 2911923 . In particular, in
this book the terms test coverage item, test case and test implementation are used to
distinguish between the test goals, the data needed to achieve those goals, and the code
which implements the test, respectively.
A
DR
23 ISO/IEC/IEEE 29119
For CS265/CS608 Students Personal Use Only
Chapter 2
Equivalence Partitions
results.
The goal of this technique is to verify that the software works correctly by using at
least one representative input value for each different type of processing. This technique
should also ensure that at least one representative output value for each different type
of processing is produced. To identify these values, equivalence partitions are used.
DR
The chapter starts with a fully worked example and concludes with a more detailed
examination of the topic.
2.2 Example
The program check as described in Section 1.4 uses a class OnlineSales to implement
its core functionality. This class contains a static method giveDiscount() which is defined
below1 . Test techniques will be introduced using static methods that do not require an
object to be instantiated through a constructor. Once the basic techniques have been
described, the chapter on testing object-oriented software (Chapter 9) examines using
25
For CS265/CS608 Students Personal Use Only
Outputs
return value:
FULLPRICE if bonusPoints≤120 and not a goldCustomer
FULLPRICE if bonusPoints≤80 and a goldCustomer
DISCOUNT if bonusPoints>120
DISCOUNT if bonusPoints>80 and a goldCustomer
For simplicity, this example does not use Java Exceptions to indicate an error: see
Section 11.8 for an example of how to test code that does.
A
2.2.1 Analysis: Identifying the Equivalence Partitions
The first step in developing the tests is an analysis of the specification in order to identify
DR
Natural Ranges
The natural ranges are based on the types of the input parameters, and of the return
value(s). To help with this analysis, we suggest the application of value lines for each
input and output.
A value line is a graphical representation of a range of values. The minimum value is
always placed to the left, and the maximum value to the right. These value lines assist in
ensuring that there are no gaps or overlaps in the equivalence partitions once they have
been identified.
The parameter bonusPoints is of the type long, and has one natural range with
264 values2 . The value line for bonusPoints, representing this range of possible values,
is shown in Figure 2.1. This indicates that bonusPoints may hold any value from
Long.MIN VALUE to Long.MAX VALUE.
2 This example is based on Java, which uses 64 bits for the long data type.
For CS265/CS608 Students Personal Use Only
Boolean values are best treated as two separate ranges with one value each, as there
is no natural ordering of the values true and false. The boolean parameter goldCustomer
has two natural ranges, each with one value as shown in Figure 2.2.
true false
Enumerated values are best treated in the same way as Boolean values, with multiple
separate ranges and one value in each range. Even though some languages do define
DISCOUNT ERROR
Figure 2.3: Value Line showing the Natural Ranges of the Return Value
Having used value lines to assist in identifying the natural ranges for the inputs and
DR
The double-dot (“..”) notation indicates that the parameter may have any value in
the range, including the minimum and maximum values shown. This is equivalent to the
mathematical notation of the inclusive interval [ Long.MIN VALUE, Long.MAX VALUE]
as used throughout the book.
For CS265/CS608 Students Personal Use Only
Specification-Based Ranges
The specification-based ranges may now be identified by walking the value lines from left
to right and identifying the values at which a change of processing may take place.
For the input bonusPoints, the first value at the left is Long.MIN VALUE. Walking
along the value line, according to the specification, all the subsequent values up to and
including the value 0 are treated equivalently: they are all processed as errors. This
produces the first specification-based range on the value line, as shown in Figure 2.4.
Value lines are conceptual models that do not require to be drawn to scale as you
can see for example in Figure 2.4. The width of each range should be chosen to allow
the values to be clearly entered; the width does not represent the number of values in
the range.
A FT
The next value after 0 is 1 which can now be entered as shown in Figure 2.5.
Walking along the value line from 1, all the subsequent values up to and including
the value 80 are treated equivalently. The value 81, however, will not always be treated
differently, but it may be, and thus the processing is not equivalent for both 80 and
81. This produces the second specification-based range on the value line, as shown in
DR
Figure 2.6.
The next value after 80 is 81 which can now be entered as shown in Figure 2.7.
Walking along the value line from 81, all the subsequent values up to and including
the value 120 are treated equivalently. This produces the third specification-based range
on the value line, as shown in Figure 2.8.
The next value after 120 is 121 which can now be entered as shown in Figure 2.9. All
the values from 121 to Long.MAX VALUE are treated equivalently again, and so this
produces the fourth and final specification-based range for bonusPoints.
Next we consider the input parameter goldCustomer, which is a boolean. The natural
ranges were identified in Figure 2.2. There are two equivalence partitions, matching the
natural ranges, as the processing in each may be different. The specification-based ranges
are shown in Figure 2.10.
true false
FT
Figure 2.10: Specification-Based Ranges for goldCustomer
Finally, for the return value, the specification states that each of the natural ranges
is the result of a different type of processing and as with goldCustomer this produces an
identical value line for the specification-based ranges as for the natural ranges as shown
in Figure 2.11.
A
FULLPRICE DISCOUNT ERROR
Some of the equivalence partitions are associated with error processing, and are
indicated with an asterisk (*). In this example it is straightforward to identify these.
However, it may be much harder to do this correctly for other specifications. In
normal test development, this analysis is probably not written down, and an experienced
developer will identify the equivalence partitions directly. But, when learning how to
test, it is recommended that you fully document the results of the analysis as shown
above. This helps to ensure that you are following the technique correctly.
used for a test case at any time. The equivalence partitions for each parameter should
be grouped together and presented in order as we can see in the table.
A blank Test Case column is also included on the right-hand side of the table – this
will be completed later, to indicate which test case covers each test coverage item.
EP2 1..80
EP3 81..120
EP4 121..Long.MAX VALUE
EP5 goldCustomer true
EP6 false
EP7 Return Value FULLPRICE
EP8 DISCOUNT
EP9 ERROR
Notes:
For CS265/CS608 Students Personal Use Only
1. The test coverage items are specific to a particular (testing) technique – it is useful
to prefix the identifier with the technique name (EP is used as an abbreviation for
Equivalence Partition).
2. An asterisk (*) indicates a test coverage item for an input error. Each input error
must be tested separately.
3. The Test Case column is completed later.
FT
picking a value from the center is more likely to expose faults due to incorrect processing
of the entire partition which is the goal of equivalence partition testing3 . This makes
debugging of any faults easier, as the causes for failures are easier to identify if they are
a good match for the fault model of the test technique.
For non-continuous ranges, such as Boolean or enumerated values, each equivalence
partition only has a single value. Table 2.5 shows the data values selected for this
example.
A
Table 2.5: Equivalence Values
To create the test case table, complete each column as described below, and shown
in Table 2.6. Give each test case a unique identifier – here we have started with T1.1.
Select the first non-error equivalence partition for each parameter: this gives EP2 for
bonusPoints and EP5 for goldCustomer. Enter the test coverage item identifier for each
into the TCI Covered column – note that EP2,5 is shorthand notation for EP2 and EP5,
used to save space. Enter the selected data values for each parameter in the Inputs
columns.
3 As you will see in the next chapter, exposing faults in processing the boundary values correctly, is
Now use the specification to determine the correct output. In this example, this is
the return value, and the correct value according to the specification is referred to as
the expected results. This is abbreviated here to Exp. Results. From the specification, if
bonusPoints is 40 and goldCustomer is true, then the expected results are the return value
FULLPRICE. Place this value into the expected results column and add the matching
test coverage item (EP7) to the TCI Covered column as shown in Table 2.7.
ID
T1.1
A TCI
Covered
EP2,5,7 40
FT
Table 2.7: EP Test Case: T1.1 Output Value
bonusPoints
Inputs
goldCustomer
true
Exp. Results
return value
FULLPRICE
Then complete the test cases for all the other normal test coverage items, working
in order where possible. For each additional test case, data is selected to cover as many
additional normal coverage items as possible. For test case T1.2 use EP3 and EP6 as
DR
inputs. And for test case T1.3, use the remaining uncovered input test coverage item:
EP4. See Table 2.8 for the completed non-error test cases.
T1.2 produces the same output as T1.1 – this is documented by placing EP7 in square
brackets as shown in the table for T1.1. And test case T1.3 must reuse one of the already
tested values for goldCustomer – here the value false has been selected.
Finally, complete the error test coverage items: each input error test coverage item
must have its own unique test case – there can only be one input error test coverage
item covered by any one test case. This is due to error hiding which is explained in
Section 1.11.2. Table 2.9 shows the final result with all test cases with their chosen input
data values and their expected results.
For CS265/CS608 Students Personal Use Only
Notes:
FT
technique – it makes it easier to reuse test cases if the technique abbreviation
is not included in the ID of the test case.
3. Minimising the number of test cases can require multiple iterations. A target to aim
for is the maximum number of partitions of any input or output: in this example
it is 4 (for the bonusPoints parameter) – but this is not always achievable.
It is important to include the TCI Covered column. This allows the tester to confirm
that all the test coverage items have been addressed and it also allows the test auditor
A
to verify for completeness.
Combinations of input values are not considered: for example, it is not part of
equivalence partition testing to test for bonusPoints equal to 40 with goldCustomer both
true and false. Chapter 4 presents a systematic approach to testing combinations.
DR
FT
The design verification consists of a review to ensure that:
1. Every test coverage item is covered by at least one test case with suitable test data
(this confirms that the test cases are complete).
2. Every new test case covers at least one additional test coverage item (this confirms
that there are no unnecessary test cases). Ideally, each test case should cover as
many new test coverage items as possible (up to three in this example: two input
A
TCIs, and one output TCI).
In this example, we can see from Table 2.10 that every test coverage item is covered,
and from Table 2.9 that every equivalence partition test case covers additional test
DR
$ check 40 true
FULLPRICE
$ check 100 false
FULLPRICE
$ check 200 false
DISCOUNT
$ check -100 false
ERROR
2.3.2
The manually executed test cases are now implemented for automated execution.
TestNG4 is used as a demonstrative example in this book – there are many other test
A
frameworks.
A test framework typically includes a Test Runner which runs the tests, and collects
the test results. Because of this, no main() method is required in the test class. Tests
need to be identified for the runner: in TestNG, Java Annotation is used, which allows
DR
15
16 }
In this example:
A FT
raises an exception if the values are not equal, and the TestNG framework catches
this and records it as a failed test. If the @Test method runs to completion, without
any exceptions, then this is recorded as a passed test.
To reduce the code duplication, and allow the same test code to be used with
different data, parameterised (or data-driven) tests can be used. In TestNG this facility
is called a Data Provider . Listing 2.3 shows the implementation of tests T1.1–T1.4
6 The package and import statements are omitted for brevity.
For CS265/CS608 Students Personal Use Only
using parameterised tests. Again, the package and import statements are not shown for
brevity7 .
Listing 2.3: OnlineSalesTest with EP
1 public class OnlineSalesTest {
2
3 // EP test data
4 private static Object[][] testData1 = new Object[][] {
5 // test, bonusPoints, goldCustomer, expected output
6 { "T1.1", 40L, true, FULLPRICE },
7 { "T1.2", 100L, false, FULLPRICE },
8 { "T1.3", 200L, false, DISCOUNT },
9 { "T1.4", -100L, false, ERROR },
10 };
11
12 // Method to return the EP test data
13 @DataProvider(name="dataset1")
14
15
16
17
18
19
20
21
public Object[][] getTestData() {
}
return testData1;
27
28 }
(dataset1). The data provider is called, which returns an array of test data. The
test method is called repeatedly with each row of test data from the array in turn.
2.4
2.4.1
FT
Figure 2.13: EP Test Results for OnlineSales.giveDiscount()
should be processed in the same way, equivalence partition testing attempts to find these
faults.
2.4.2 Description
Equivalence partitions are based on selecting representative values of each parameter
from the equivalence partitions. Each equivalence partition for each of the parameters
is a test coverage item. Both the inputs and the output should be considered. The
technique invariably involves generating as few test cases as possible: each new test case
should select data from as many uncovered partitions as possible. Test coverage items
for errors should be treated separately to avoid error hiding. The goal is to achieve 100%
coverage of the equivalence partitions.
Value Ranges
All inputs and outputs have both natural ranges of values and specification-based ranges
of values. The natural range is based on the type. The specification-based ranges, or
partitions, are based on the specified processing. It often helps in analysing ranges to use
a diagram – see Figure 2.14. This figure shows a Java int, which is a 32-bit value, having
a minimum value of −231 and a maximum value of 231 − 1 (or Integer.MIN VALUE and
Integer.MAX VALUE), giving the natural range:
•
•
•
FT
Natural ranges for a number of other common types are:
• byte [Byte.MIN VALUE..Byte.MAX VALUE]
short [Short.MIN VALUE..Short.MAX VALUE]
long [Long.MIN VALUE..Long.MAX VALUE]
char [Character.MIN VALUE..Character.MAX VALUE]
Natural ranges for types with no natural ordering are treated slightly differently –
A
each value is a separate range containing one value:
• boolean [true][false]
• enum Colour {Red, Blue, Green} [Red][Blue][Green]
DR
Compound types, such as arrays and classes are more complicated to analyse, though
the principles are the same.
Equivalence Partitions
An Equivalence Partition is a range of values for a parameter for which the specification
states equivalent processing.
Consider a method, boolean isNegative(int x), which accepts a single Java int as its
input parameter. The method returns true if x is negative, otherwise false. From this
specification, two equivalence partitions for the parameter x can be identified:
1. Integer.MIN VALUE..-1
2. 0..Integer.MAX VALUE
Both input and output parameters have natural ranges and equivalence partitions.
A Java boolean is an enumerated type with the values true and false. Each
enumerated value is a separate range. In this example, the two different values are
produced by different processing, and so the return value has two equivalence partitions:
1. true
2. false
FT
Equivalence partitions are used in testing as, according to the specification, any one
value can be selected to represent any other value. So, instead of having a separate
test for every value in the partition, a single test can be executed using a single
value from the partition. It is equivalent to any other value, and can be picked from
anywhere in the partition. Traditionally a value in the middle of the partition is picked.
Equivalence partitions are useful for testing the fundamental operation of the software:
if the software fails using equivalence partition values, then it is not worth testing with
A
more sophisticated techniques until the faults have been fixed.
give each test coverage item for each test item a unique identifier. It is often useful to
use the prefix “EP” for equivalence partition test coverage items. Note that the values
selected from the partitions are part of test cases as we have shown in the test case tables
– for example, see Table 2.9.
2.4.6 Pitfalls
The technique calls for a minimal number of test cases. Do not provide tests for every
combination of inputs. Do not provide a separate test case for each test coverage item.
After completing the test cases, check whether there are any unnecessary test cases.
Also, check whether a reorganisation can reduce the number of test cases. Table 2.11
shows an example of each of these situations for the worked example in Section 2.2.3.
Notes:
EP[4,6,8]
EP1*,9
1000
-100
FTfalse
false
DISCOUNT
ERROR
1. Duplicate test case: X1.5 covers exactly the same test coverage items as X1.4, and
A
either one of these can be deleted.
2. Unnecessary test cases: X1.2 can be deleted, as X1.1 and X1.3 cover all the test
coverage items.
3. The result of these improvements is as shown previously in Table 2.9.
DR
2.5 Evaluation
Testing with equivalence partitions provides a minimum level of black-box testing. At
least one value has been tested from every input and output partition, using a minimum
number of test cases. These tests are likely to ensure that the basic data processing
aspects of the code are correct. But they do not exercise the different decisions made in
the code.
This is important, as decisions are a frequent source of mistakes in the code. These
decisions generally reflect the boundaries of input partitions, or the identification of
combinations of inputs requiring particular processing. These issues will be addressed in
later techniques.
2.5.1 Limitations
The software has passed all the equivalence partition tests – so is it fault free? As
discussed earlier, only exhaustive testing can answer this question, and faults may remain.
Some limitations of equivalence partition testing are explored below by making changes
to inject faults into the source code.
For CS265/CS608 Students Personal Use Only
Source Code
The source code for the method giveDiscount() in class OnlineSales is shown in
Listing 2.4.
15
16
17
18
A *
FT
* Other customers get a discount above 120 bonus points.
23 {
24 Status rv = FULLPRICE;
25 long threshold=120;
26
27 if (bonusPoints<=0)
28 rv = ERROR;
29
30 else {
31 if (goldCustomer)
32 threshold = 80;
33 if (bonusPoints>threshold)
34 rv=DISCOUNT;
35 }
36
37 return rv;
38 }
39
40 }
For CS265/CS608 Students Personal Use Only
Fault 1
Equivalence partition tests are designed to find faults associated with entire ranges of
values. If we inject a fault on line 33, which in effect disables the processing that returns
DISCOUNT, we expect to see at least one test fail. This fault is shown in Listing 2.5.
Note that the original operator “>” in Listing 2.4 has been changed to be “==” in
Listing 2.5 on line 33.
}
if (goldCustomer)
threshold = 80; FT
if (bonusPoints==threshold) // fault 1
A
rv=DISCOUNT;
37 return rv;
38 }
DR
Fault 2
Equivalence partition tests are not designed to find faults at the values at each end of
an equivalence partition. If we inject a fault which moves the boundary value for the
processing that returns DISCOUNT, then we do not expect to see any failed tests. This
For CS265/CS608 Students Personal Use Only
fault is shown in Listing 2.6. Note that the original operator “>” in Listing 2.4 has been
changed to be “>=” in this fault in Listing 2.6 on line 33.
return rv;
FT
EP Testing against Fault 2
A
The results of running the equivalence partition tests against the code with Fault 2 are
shown in Figure 2.17. Note that none of the tests have identified the fault.
DR
Demonstrating Fault 2
The results of executing the code with Fault 2, using specially selected input values, are
shown in Figure 2.18.
$ check 80 true
DISCOUNT
$ check 120 false
DISCOUNT
Note that the wrong result is returned for both the inputs (80,true) and (120,false).
The correct result is FULLPRICE in each case, but DISCOUNT has been returned.
equivalence partitions (as test coverage items) for boolean and enumerated data types.
Experienced testers may also not need them for variables such as integers if the number
of equivalence partitions are small. In these cases, such a tester may proceed to directly
developing the table with the test coverage items and ignore some or all of the value lines.
If the test item (e.g. a function) only has few variables and those being very simple, the
tester may also directly develop the test case table with test data. However, in cases
where high quality is required (embedded systems, life-critical systems, etc.) even an
experienced tester may need to document these steps for quality review, or in the case
of a legal challenge to the quality of the software.
For CS265/CS608 Students Personal Use Only
Chapter 3
Equivalence Partition testing uses representative values from each range of values for
FT
which the specification states equivalent processing. Programmers often make mistakes
at the boundary values of these ranges, which will not be caught by equivalence partition
testing as has been demonstrated in Chapter 2. This chapter introduces the black-box
testing technique of boundary value analysis (BVA), starting with a worked example,
and concluding with a more detailed examination of the topic.
Definition: a boundary value is the value at the boundary (or edge) of an equivalence
partition. Each equivalence partition has exactly two boundary values.
3.2 Example
Testing of the method OnlineSales.giveDiscount(bonusPoints, goldCustomer) continues
in this chapter. To summarise, this method returns:
46
For CS265/CS608 Students Personal Use Only
true false
FT
The Boundary Values are the minimum and maximum values of each equivalence
partition as shown in Table 3.1. This table highlights again that boolean and enum
parameters have single-valued equivalence partitions.
A
Table 3.1: Boundary Values for giveDiscount()
To be completed later
BV4 80
BV5 81
BV6 120
BV7 121
BV8 Long.MAX VALUE
BV9 goldCustomer true
BV10 false
BV11 Return Value FULLPRICE
BV12 DISCOUNT
3.2.3
BV13
Test Cases
FT
ERROR
Test cases are now created with test input data that is selected from the boundary
values. The expected results are determined from the specification. For each additional
test case, data is selected to cover as many additional normal test coverage items as
A
possible. Each test coverage item that represents an error must have its own unique test
case. The input test data must cover every input boundary value, and also ensure that
every output boundary value is included in the expected results.
DR
Notes:
1. boundary value analysis may appear to duplicate all the equivalence partition test
coverage items, but it does not. The boundary values are not reasonable values to
use for equivalence partition as they are special edge values and not representative
of the entire range.
For CS265/CS608 Students Personal Use Only
TCI Parameter
FT
completed and the remaining results are now in Table 3.4. This is completed by reading
3. There should be no duplicate tests while taking the equivalence partition test cases
into consideration.
In this example, we can see from Table 3.4 that every test coverage item is covered, and
from Table 3.3 that every equivalence partition test case covers additional test coverage
items, as follows:
• T2.1 covers BV3, BV9, and BV11 for the first time.
• T2.2 covers BV4 and BV10 for the first time. It also covers BV11 again, but this
is unavoidable as that is the result of these inputs.
• T2.3 covers BV5 for the first time. It also covers BV10 and BV11 again, but this
is also unavoidable.
• T2.4 covers BV6 for the first time. It also, unavoidably, covers BV10 and BV11
again.
• T2.5 covers BV7 and BV12. It also, unavoidably, covers BV10 again.
• T2.6 covers BV8. It also, unavoidably, covers BV10 and BV12 again.
FT
• T2.7 is an error test case, and it covers the single input error test coverage item
BV1*. It also covers the output test coverage item BV13. Note that although the
selected input value of goldCustomer is false, it does not cover BV10.
• T2.8 is also an error test case, and it covers BV2*. It also unavoidably covers BV13
again. As in the previous test case, it does not cover BV10.
T2.3 is not a duplicate of any of the equivalence partition test cases. Even though it
appears to cover the equivalence partition test coverage items EP3, EP6, and EP7, it is
A
not a duplicate. It is required that equivalence partition test cases select values near the
center of the partition – boundary value analysis selects values at the boundaries of the
partitions. The same is true for the other boundary value analysis test cases where the
boundary values are in the same partitions as those covered by the equivalence partition
DR
test cases1 .
{
FT
public void test_giveDiscount( String id, long bonusPoints,
boolean goldCustomer, Status expected)
assertEquals(
OnlineSales.giveDiscount(bonusPoints, goldCustomer),
A
47 expected );
48 }
49
50 }
DR
3.4.2 Description
Programming faults are often related to the incorrect processing of boundary conditions,
FT
so an obvious extension to equivalence partitions is to select two values from each
partition: the bottom and the top. This doubles the number of tests, but is more likely
to find boundary-related programming faults. Each boundary value for each parameter is
a test coverage item. As for equivalence partitions, the number of test cases is minimised
by selecting data that includes as many uncovered test coverage items as possible in
each new test case. Error test cases are always considered separately – only one error
boundary value is included per error test case. The goal is to achieve 100% coverage of
A
the boundary values.
Each equivalence partition has an upper and a lower boundary value. Experience has
shown that many software failures are due to the incorrect handling of limits. Boundary
values (BV) therefore increase the sophistication over just using equivalence partitions,
however, at the cost of doubling the number of test cases and tests that need to be run.
For the example method isNegative(int x), the boundary values for x are as follows:
1. Integer.MIN VALUE
2. -1
3. 0
4. Integer.MAX VALUE
The boundary value for the return value are the same as the equivalence partitions
as it is a boolean type – each equivalence partition is a range with only one data value:
1. true
2. false
• For a contiguous data type, the successor to the value at the top of one partition
must be the value at the bottom of the next. In the example above, the upper
value -1 from the first partition is directly followed by the lower value 0 from the
next partition.
• The natural range of the parameter provides the ultimate maximum and minimum
values.
Boundary values do not overlap, and there must be no gap between partitions. A
convenient shorthand for specifying partitions and their boundary values is as follows:
FT
It is good practice to give each test coverage item for each test item a unique identifier.
It is often useful to use the prefix “BV” (for Boundary Value) for the test coverage items.
In equivalence partition testing, the tester has to decide on representative values from
each partition: in boundary value analysis testing, the tester uses the already identified
upper and lower values for each partition.
ensure that all the test coverage items related to the output parameters are covered. It
may be necessary to read the specification backwards to determine input values that will
result in a return value having the required boundary value.
Hint: It is usually easier to identify test cases by going through the test coverage
items in order, and selecting the next uncovered boundary value for each parameter.
3.4.6 Pitfalls
As for equivalence partition testing, boundary value analysis should use a minimal
number of tests. Do not provide test cases for every combination of input boundary
values, and do not provide a separate test case for each test coverage item.
3.5 Evaluation
Boundary value analysis enhances the testing provided by equivalence partitions.
Experience indicates that this is likely to find significantly more errors than equivalence
partitions.
Boundary value analysis tests two additional values from every input and output
partition, the minimum and maximum. These tests provide some basic assurance that
the correct decisions are made in the code. Boundary value analysis provides exactly
For CS265/CS608 Students Personal Use Only
the same test coverage items as equivalence partitions for boolean and for enumerated
parameters.
Snippet 3.1 shows a simple example of a boundary value fault. Line 3 contains a fault
in the if statement. Instead of the expression (!x <= 100), which is what the developer
intended, the expression (!(x < 100)) has been used.
Boundary value analysis does not explore the decisions in detail, and in particular
3.5.1 Limitations
FT
does not explore decisions associated with different combinations of inputs. Combinations
of inputs will be addressed in the next chapter.
The software has passed all the equivalence partition and boundary value analysis tests
– so is it correct? As discussed earlier, only exhaustive testing can answer this question,
and faults may remain.
A
We will now explore the limitations of boundary value analysis testing by deliberately
injecting faults into the source code.
The results of running the equivalence partition and boundary value analysis tests against
the code with Fault 2 are shown in Figure 3.5.
===============================================
Command line suite
Total tests run: 12, Passes: 11, Failures: 1, Skips: 0
Figure 3.5: BVA and EP Test Results for giveDiscount() with Fault 2
For CS265/CS608 Students Personal Use Only
The fault is found by one of the boundary value analysis tests. This is expected as
the fault was inserted at the processing of a boundary value. All the tests are run, even
though one of them has failed.
Fault 3
Equivalence partition and boundary value analysis tests are not designed to find faults
associated with the correct processing of combinations of values from the input partitions.
If we inject a fault that causes a combination of input values to be incorrectly processed,
then we do not expect to see any failed tests2 .
A fault is inserted as shown in Listing 3.2 on line 32.
if (bonusPoints<=0)
rv = ERROR;
else {
A FT
31 if (goldCustomer)
32 threshold = 120; // fault 3
33 if (bonusPoints>threshold)
34 rv=DISCOUNT;
35 }
DR
36
37 return rv;
38 }
By changing the value from 80 to 120 on line 32, the code no longer works correctly
when goldCustomer is true and bonusPoints is in the range 81..120. This combination
of input values has not been tested by equivalence partition or boundary value analysis.
2 If we do, it is by chance.
For CS265/CS608 Students Personal Use Only
Figure 3.6: BVA and EP Test Results for giveDiscount() with Fault 3
FT
The fault is not discovered by our tests. In a small example like this, depending
on the data values selected, the equivalence partition and boundary value analysis tests
might have found this combination of values that expose this fault. But in general, and
for larger examples, neither equivalence partition nor boundary value analysis provide a
systematic way to find and test all combinations.
Here, the wrong result is returned for the inputs (100,true). The correct result is
DISCOUNT and not FULLPRICE as returned.
FT
(at the top of the lower range, and at the bottom of the upper range). Experienced
testers may decide to use what is referred to as 3-point boundary values (at the top of
a partition, one additional value is used, just below the upper boundary value. And, at
the bottom of a partition, one additional value is used, just above the lower boundary
value).
A
DR
For CS265/CS608 Students Personal Use Only
Chapter 4
FT
input values which can lead to undetected faults in the code. This chapter introduces
the black-box technique of decision table (DT) testing.
4.2 Example
Testing of the method OnlineSales.giveDiscount(bonusPoints, goldCustomer) continues
in this chapter. To summarise the specification, this method returns:
58
For CS265/CS608 Students Personal Use Only
4.2.1 Analysis
The first step in the analysis is to restate the specification in terms of causes and effects.
These are boolean expressions relating to possible values, or ranges of values, of the
inputs and outputs associated with different types of processing.
These causes and effects are then used to generate a decision table which relates the
causes (inputs) to the effects (outputs) through rules. This provides a systematic way of
identifying all the combinations, which are the test coverage items for testing.
This example only considers normal inputs and not error inputs. If different
combinations of error inputs result in different outputs, then we use a separate table.
Causes
Non-error causes can be derived from the input equivalence partitions, identified from
the specification in Chapter 2, and shown below in Table 4.1.
The next step is to develop boolean expressions that define the non-error causes. We
develop these causes by working through the input parameters in order, and examining
the partitions for each from left to right (i.e. increasing values). This provides a
systematic approach and limits mistakes.
Non-error partitions for bonusPoints can be turned into the following causes1 :
• The partition Long.MIN VALUE..0 is an error partition, and is not used.
• The partition 1..80 is a normal partition, and can be identified by the boolean
expression bonusPoints<=80 being true (as error values less than 1 are not being
considered).
• The partition 81..120 is also a normal partition, and can be identified by the
previously derived expression bonusPoints<=80 being false, and the expression
bonusPoints<=120 being true.
• The partition 121..Long.MAX VALUE is a normal partition, and can be iden-
tified by the previously derived expressions bonusPoints<=80 being false and
bonusPoints<=120 being false.
Having fewer causes is better as this reduces the size of the decision table. N partitions
should lead to no more than Log2 (N ) expressions – for example, 10 partitions could
realistically be turned into about 3 or 4 expressions. For the example above, a third
FT
The expression goldCustomer being true is much easier to understand than the
expression not goldCustomer being false.
• Handling boolean and enum types is generally very straightforward
To summarise, the analysis has lead to the following non-error causes:
• bonusPoints ≤ 80
• bonusPoints ≤ 120
• goldCustomer
A
We recommend following our approach, as this will lead to similar results every time.
Other approaches may lead to a logically equivalent set of causes which may look quite
different. It takes practice to develop causes that are easy to use and review. It is in
DR
Effects
The output equivalence partitions, identified from the specification in Chapter 2, are
repeated in Table 4.2 for convenience.
Working through the output partitions in order allows the non-error effects for
bonusPoints to be systematically developed:
• The partition FULLPRICE is a normal partition, and can be identified by the
expression return value==FULLPRICE2 being true.
2 The alternative syntax giveDiscount()==FULLPRICE can also be used.
For CS265/CS608 Students Personal Use Only
same time.
FT
In order to build the decision table, all the combinations of causes must be identified.
Often all the combinations of the causes are not logically possible. In this example, it is
not possible for bonusPoints ≤ 80 to be true and bonusPoints ≤ 120 to be false at the
You can build the table in many ways, but our advice is to first identify all possible
combinations of causes through an intermediary table. Here, all combinations can be
A
listed; all those that are infeasible can then be removed. All remaining combinations can
then be moved to a final table. With enough practise, the need for such an intermediary
step will eventually fade.
Start with an empty table and list all the causes in the first column. It helps to order
the causes based on the variables they address, e.g. the two causes about bonusPoints
DR
are next to each other. Next, create 2N columns for the combinations, where there are N
causes. In our example, we have 3 causes, and therefore create space for 8 combinations
(see Table 4.3. It is good practice to document the non-error condition in the table title
– here it is bonusPoints>0 – as a reminder that error causes are excluded.
Causes Combinations
bonusPoints ≤ 80
bonusPoints ≤ 120
goldCustomer
We complete the combination columns systematically, starting with all T (for true)
at the left-hand side of the table, and ending with all F (for false) at the right-hand side.
The sequence of T/F values to use for different numbers of causes is shown below3 :
3 This is similar to a hardware truth table, where 1 and 0 are used in place of T and F.
For CS265/CS608 Students Personal Use Only
Causes Combinations
bonusPoints ≤ 80 T T T T F F F F
bonusPoints ≤ 120
goldCustomer
T
T
FT
T
F
F F T T
T F T F
F
T
F
F
Infeasible combinations can now be identified – these are greyed out in Table 4.5:
• The combination T,F is infeasible for the causes bonusPoints<=80 and
bonusPoints<=120. If bonusPoints is greater than 120 it must also be greater than
A
80.
Causes Combinations
bonusPoints ≤ 80 T T T T F F F F
bonusPoints ≤ 120 T T F F T T F F
goldCustomer T F T F T F T F
The columns that contain impossible combinations of causes can now be removed,
leaving only the feasible combinations of causes as shown in Table 4.6.
Causes Combinations
bonusPoints ≤ 80 T T F F F F
bonusPoints ≤ 120 T T T T F F
goldCustomer T F T F T F
In practice, only one table needs to be used, and developed step-by-step as shown in
Tables 4.3 to 4.6. The infeasible columns can be crossed out, instead of being removed,
as shown in Table 4.7.
For CS265/CS608 Students Personal Use Only
Causes Combinations
bonusPoints ≤ 80 T T T T F F F F
bonusPoints ≤ 120 T T F
F T T F F
goldCustomer T F T F T F T F
Decision Table
A software test decision table maps each combination of causes to its specified effect.
Using all feasible combinations of causes (Table 4.6) as a basis, a decision table is
initialised with all causes and effects. Columns are numbered for the rules. The effects
rows are initially left blank as shown in Table 4.8.
Causes
A FT
Table 4.8: Initial Decision Table for giveDiscount()
where bonusPoints>0
1 2
Rules
3 4 5 6
bonusPoints ≤ 80 T T F F F F
bonusPoints ≤ 120 T T T T F F
goldCustomer T F T F T F
Effects
DR
The rules must be (a) complete, and (b) independent. One rule, and exactly one rule,
must be selected by any feasible combination of the input causes.
The effects for each rule can now be completed from the specification. Starting with
Rule 1, which applies when:
• bonusPoints ≤ 80 is true
• bonusPoints ≤ 120 is true
• goldCustomer is true
This means that bonusPoints is less than or equal to 80 (but greater than 0), and
goldCustomer is true. According to the specification, the expected results are the return
value FULLPRICE for these input values. We now include this as an effect, as shown in
Table 4.9.
For CS265/CS608 Students Personal Use Only
Rules
1 2 3 4 5 6
Causes
bonusPoints ≤ 80 T T F F F F
bonusPoints ≤ 120 T T T T F F
goldCustomer T F T F T F
Effects
return value == FULLPRICE T
return value == DISCOUNT F
It is good practice to complete both the T and F values for the effects: T for
We now continue in the same style and add the effects for all remaining rules (see
Table 4.10).
Causes
bonusPoints ≤ 80 T T F F F F
bonusPoints ≤ 120 T T T T F F
goldCustomer T F T F T F
Effects
return value == FULLPRICE T T F T F F
return value == DISCOUNT F F T F T T
giveDiscount(bonusPoints,goldCustomer) returns:
FULLPRICE if bonusPoints≤120 and not a goldCustomer
FULLPRICE if bonusPoints≤80 and a goldCustomer
DISCOUNT if bonusPoints>120
For CS265/CS608 Students Personal Use Only
• Rule 1 states that when bonusPoints is less than or equal to 80 (and greater than
0), and goldCustomer is true then the expected return value is FULLPRICE.
This is correct.
• Rule 2 states that when bonusPoints is less than or equal to 80 (and greater than
0), and goldCustomer is false then the expected return value is FULLPRICE.
This is correct.
• Rule 3 states that when bonusPoints is greater than 80 and less than or equal to
120, and goldCustomer is true, then the expected return value is DISCOUNT.
This is correct.
• Rule 4 states that when bonusPoints is greater than 80 and less than or equal to
FT
120, and goldCustomer is false, then the expected return value is FULLPRICE.
This is correct.
• Rule 5 states that when bonusPoints is greater than 120, and goldCustomer is true,
then the expected return value is DISCOUNT.
This is correct.
• Rule 6 states that when bonusPoints is greater than 120, and goldCustomer is false,
A
then the expected return value is DISCOUNT.
This is correct.
DT2 2
To be
later
DT3 3
DT4 4
DT5 5
DT6 6
For CS265/CS608 Students Personal Use Only
goldCustomer
Return Value
A true
false
FULLPRICE
DISCOUNT
ERROR
FT
121..Long.MAX VALUE 200
true
false
FULLPRICE
DISCOUNT
ERROR
The development of test data for the first test case is shown in Table 4.13. We initially
consider the test cases as being candidate test cases as we expect to duplicate some test
cases from equivalence partition testing. The test case ID’s are temporary at this stage
DR
The completed test cases required to achieve full decision table coverage are shown
in Table 4.14.
For CS265/CS608 Students Personal Use Only
FT
new test cases that have equivalent test data to previously defined test cases. They can
be removed. We are able to identify a number of such duplicate test cases:
• X3.1 duplicates test case T1.1
• X3.2 duplicates test case T2.2, even though the data values differ:
– The data values are different, as X3.2 is based on equivalence partition values
(40, false) and T2.2 is based on BVA values (80,false).
– However, T2.2 matches the same rule in the decision table:
A
Rule 2 (bonusPoints ≤ 80 and !goldCustomer).
– So X3.2 would be a duplicate test case for decision table testing
• X3.4 duplicates test case T1.2
• X3.6 duplicates of test case T1.3
DR
By selecting the same values as before for each partition, we have made the task of
identifying duplicate tests much easier. Alternatively, testers may remove duplicates only
during the test implementation, or even refrain from removing them at all.
After removing duplicates, we can now add the decision tables tests to create the
complete set of test cases (see Table 4.15) with test case IDs assigned in our typical
notation (e.g. T3.1).
In this example, we can see from Table 4.16 that every TCI is covered, and from
DR
Table 4.15 that every test case covers a different TCI. Comparing the test data for the
equivalence partition, boundary value analysis, and decision tables tests (Table 2.9, 3.3,
and 4.15) we can see that there are no duplicate tests.
4 As discussed previously, T2.2 covers DT2 even though the data values are different.
For CS265/CS608 Students Personal Use Only
@DataProvider(name="testset1")
public Object[][] getTestData() {
return testData1;
A FT
false,
true,
true,
ERROR },
DISCOUNT },
DISCOUNT },
34 }
35
36 @Test(dataProvider="testset1")
37 public void test_giveDiscount( String id, long bonusPoints, boolean
goldCustomer, Status expected) {
DR
4.4
All the tests pass. FT
Figure 4.1: DT Test Results for giveDiscount()
of input values are not processed correctly. This leads to incorrect outputs. The faults
tend to be associated with an incorrect algorithm, or missing code, which do not identify
or process the different combinations properly.
By testing every combination of values from the input partitions, decision table testing
attempts to find these faults.
4.4.2 Description
The previous two techniques (equivalence partition and boundary value analysis) reduce
the number of tests by not considering combinations. Decision table testing provides
additional coverage by identifying as few tests as possible to cover all the possible
combinations that have an impact on the output.
The test objective is to achieve 100% coverage of the rules in the decision table
(representing combinations).
are described as logical statements (or predicates), based on the specification. These
expressions specify the conditions required for a particular variable to cause a particular
effect.
To identify a minimum subset of possible combinations that will test all the different
behaviours of the program, a decision table is created. The inputs (causes) and outputs
(effects) are specified as boolean expressions (using predicate logic). Combinations of the
causes are the inputs that will generate a particular response from the program. These
causes and effects are combined in a decision table that describes their relationship. Test
coverage items are then constructed that will cover all feasible combinations of cause
and effect. For N independent causes, there are 2N different combinations. The decision
table specifies how the software should behave for each combination. The decision table
can be reduced by removing infeasible cause combinations.
FT
Example A Consider the method boolean isNegative(int x) as described previously.
There is a single cause:
Notes:
• For mutually exclusive expressions, such as x<0 and x>=0, only one form of
DR
1. x==0
Example C. Consider the method int largest(int x, int y), which returns the largest of
the two input values.
The causes are:
1. x>y
2. x<y
For CS265/CS608 Students Personal Use Only
1. return value==x
2. return value==y
Notes:
• Where there are three possible situations (here, x<y, x==y, and x>y) you
need at least two expressions to cover them:
1. If x>y, then x<y is false.
2. If x<y, then x>y is false.
3. If x==y, then x>y and x<y are both false.
• Where the effect is for an output (here the return value) to take on one of
the input values, it is important to list all the possibilities, as they are not
mutually exclusive. In this case, when x is equal to y, the output is equal to
both x and y.
FT
Example D. Consider the method boolean inRange(int x, int low, int high), which
returns true when low ≤ x ≤ high. Note that the return value is undefined if
high<low, and we cannot test in this condition.
It is not optimal, but it would be possible to create 3 causes as follows:
A
1. x<low
2. low ≤ x ≤ high
3. x>high
However, mutually exclusive rules (where only one can be true) make for large
decision tables, which are difficult to handle. A preferred set of causes is as follows:
DR
1. x<low
2. x≤high
These causes can be interpreted as shown in Table 4.17:
This both reduces the number of causes, and allows for better use of combinations
of causes (as they are no longer completely mutually exclusive).
There is only one effect for the boolean return value (which can be true or false):
1. return value
For CS265/CS608 Students Personal Use Only
Notes:
• It is redundant to use the effect return value==true. The expression return
value is a boolean expression, and the T in the decision table indicates when
it is true or false.
• For a boolean return value, it is also redundant to use two effects:
1. return value
2. !(return value)
Only a single rule is required:
1. return value5
• An example of what happens if you do not follow these guidelines in shown in
Table 4.25.
Example E. The method boolean condInRange(int x, int low, int high, boolean flag)
returns true if the flag is true and low ≤ x ≤ high. It returns false otherwise. As
before, if high<low, the output is undefined, and we cannot test for this.
The causes are:
A
1. flag
2. x<low
3. x≤high
DR
1. return value
The highlighted combinations are infeasible, as x cannot be both less than low and
greater than high, and can be removed, as shown in Table 4.19.
5 An alternate notation is to use the name of the method in place of the expression return value. In
• In both cases, where x<low, then x must be less than high, so x≤ high must
be true.
Decision Tables
A Decision Table 6 is used to map the causes and effects through rules. Each rule states
that under a particular combination of input causes, a particular set of output effects
To generate the decision table, each cause is listed in a separate row, and each
combination of causes listed in a separate column creates a different effect. Each column
is referred to as a Rule in the table – each rule is a different test coverage item (and a
different test case).
The decision tables for the three previous examples are shown below. The letter “T”
A
is used as shorthand for true, and “F” for false.
DR
6 In mathematics, these tables are referred to as Truth Tables, as they only contain the values true
and false, but the term decision tables is widely used in software testing as they define the decisions to
be made by the software. Cause-Effect graphs can also be used here.
For CS265/CS608 Students Personal Use Only
Rules
1 2
Causes
x<0 T F
Effects
return value T F
FT
Table 4.21 shows the decision table for isZero():
Rules
1 2
Causes
DR
x==0 T F
Effects
return value T F
• Rule 1 states that if (x>y) and !(x<y), then the return value is x.
• Rule 2 states that if !(x>y) and (x<y), then the return value is y.
• Rule 3 states that if !(x>y) and !(x<y), implying that (x==y), then the return
value is equal to both x and y.
For CS265/CS608 Students Personal Use Only
Rules
1 2 3
Causes
x>y T F F
x<y F T F
Effects
return value==x T F T
return value==y F T T
• Rule 1 states that if (x<low) and (x≤high), then the return value is false.
A FT
• Rule 2 states that if !(x<low) and (x≤high), then the return value is true.
• Rule 3 states that if !(x<low) and !(x≤high), then the return value is false.
Rules
1 2 3
Causes
x<low T F F
x≤high T T F
DR
Effects
return value F T F
Table 4.24 shows the decision table for condInRange(). The rules should be
For CS265/CS608 Students Personal Use Only
interpreted as follows:
• Rule 1: if (flag) and (x<low) and (x≤high), then return value is false.
• Rule 2: if (flag) and !(x<low) and (x≤high), then the return value is true.
• Rule 3: if (flag) and !(x<low) and !(x≤high), then the return value is false.
• Rule 4: if !(flag) and (x<low) and (x≤high), then the return value is false.
• Rule 5: if !(flag) and !(x<low) and (x≤high), then the return value is false.
• Rule 6: if !(flag) and !(x<low) and !(x≤high), then the return value is false.
Rules
1 2 3 4 5 6
Causes
Effects
flag
x<low
x≤high
return value
FT T
T
T
F
T
F
T
T
T
F
F
F
F
T
T
F
F
F
T
F
F
F
F
Table 4.25 shows for comparison a poor implementation of the decision table for
A
condInRange():
• There is an unnecessary number of causes, leading to a larger table.
• The use of the effects return value==true and return value==false leads to hard-
DR
Rules
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Causes
flag T T T T T T T T F F F F F F F F
x<low T T T T F F F F T T T T F F F F
low≤x≤high T T F F T T F F T T F F T T F F
x>high T F T F T F T F T F T F T F T F
Effects
return value==true F T F F F F
return value==false T F T T T T
Even if you remove the invalid combinations, the table is not being used for
its purpose, which is to generate the combinations. And still contains unnecessary
For CS265/CS608 Students Personal Use Only
duplication. Note that Table 4.24 contains exactly the same information in a much
more concise form.
Sub-Tables
Table 4.24 can be reduced in size, by splitting the problem into two separate tables: one
with flag always true, and one with flag always false. Each table will be half the size of
the original table – see Tables 4.26 and 4.27.
Causes
FT
Table 4.26: Decision Table for condInRange() with flag true
x<low
1
T
Rules
F
2 3
F
x≤high T T F
A
Effects
return value F T F
DR
Rules
1 2 3
Causes
x<low T F F
x≤high T T F
Effects
return value F F F
Recall that we have already used this technique to remove error rules. If there are
interesting combinations of causes that produce errors, then these can be presented in a
separate table (with error rules only).
7 The other standard technique of using don’t-care conditions is not covered in this book. It is
Pairwise Testing
Large decision tables can be reduced by limiting the number of rules. Instead of including
every combination of causes, we can instead include just the combinations for every pair
of causes. The technique involves identifying every possible pair of combinations, and
then combining these pairs into as few rules as possible.
To demonstrate pairwise testing, we can reduce Table 4.24 into Table 4.28.
Rules
1 2 3 4 5
Causes
flag T T T F F
x<low T F F T F
x≤high T T F T F
•
Effects
All the possible pairs of causes are included at least once in the decision table:
Rules
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
T T T T T T T T F F F F F F F F
T T T T F F F F T T T T F F F F
T T F F T T F F T T F F T T F F
T F T F T F T F T F T F T F T F
Rules
1 2 3 4 5 6
T
T
T
T
T
F
T
F
FT
F F
T F
F T
T T
T
T
F
F
F
F
F
F
Each row represents a cause. For each pair of causes, every possible combination of
A
{T,T},{T,F},{F,T},{F,F} occurs at least once8 . For example:
• row1 and row2 – Rule 1
• row1 and row3 – Rule 1
• row1 and row4 – Rule 1
DR
Usually a decision table will not include error situations, due to the number of rules
required to describe all of these. Errors would only be included in a separate table for
clarity, and only when different combinations of causes result in different errors.
4.4.6 Pitfalls
FT
Decision table testing can go wrong in a number of ways:
• Ensure that the causes are complete and do not overlap otherwise the table may
be incomplete or inconsistent.
• Use simple logical expressions for causes (i.e. with no boolean operators, such as
the && or || operators). Otherwise it is difficult to ensure all the combinations are
created by the table.
A
• The rules must be unique, you must ensure that no two rules can be true at the
same time (i.e. every rule must have a different combination of causes). Otherwise
the table is invalid and cannot be used to derive test cases.
• Ensure that there are no possible combinations of input values that cause no rules
DR
to be matched.
4.5 Evaluation
Decision table testing approaches black-box testing from a slightly different viewpoint
compared to equivalence partition and boundary value analysis testing. Instead of
considering each input individually, it considers combinations of input values that are
expected to have an effect on the processing or the output. This provides increased
coverage of decisions in the code relating to categorising the inputs, or deciding what
action to take based on a combination of input values.
This is important, as complex decisions are a frequent source of mistakes in the code.
These decisions generally reflect the correct identification of a complex situation, and if
not correctly implemented can result in the wrong processing taking place.
4.5.1 Limitations
The software has passed all the equivalence partition, boundary value analysis, and
decision tables tests – so is it correct? As discussed earlier, only exhaustive testing
can provide a conclusive result, so after decision table testing faults may remain. Some
benefits and limitations of decision table testing are now explored by injecting faults into
the source code.
For CS265/CS608 Students Personal Use Only
FT
java.lang.AssertionError: expected [DISCOUNT] but found [FULLPRICE]
===============================================
Command line suite
Total tests run: 14, Passes: 13, Failures: 1, Skips: 0
===============================================
A
Figure 4.2: DT Test Results for giveDiscount() with Fault 3
Fault 4
DR
A new fault is now inserted in the code to demonstrate a limitation of decision table
testing, as shown in Listing 4.2.
For CS265/CS608 Students Personal Use Only
The results of running the equivalence partition, boundary value analysis, and decision
tables tests against this fault are shown in Figure 4.3.
As expected, the fault is not found – the inserted fault bears no relationship to the
specification, and therefore is unlikely to be found by any black-box testing technique.
$ check 43 true
DISCOUNT
Note that the wrong result is returned for the inputs (43,true). The correct result is
FULLPRICE.
4.5.2
Strengths
Strengths and Weaknesses
FT
Exercises combinations of input data values
Expected results are created as part of the analysis process
Weaknesses
A
The decision tables can sometimes be very large. Different approaches exist to
reduce the tables.
A more detailed specification may lead to increased causes and effects, and
consequently large and complex tables.
DR
Chapter 5
Statement Coverage
In this chapter, white-box testing is introduced, and its most basic form Statement
Coverage is presented.
1 Not all lines can be executed: blank lines, comments, {}’s, etc.
85
For CS265/CS608 Students Personal Use Only
5.3 Example
We continue testing the method OnlineSales.giveDiscount() with Fault 4 inserted as
shown in Listing 5.1. This fault was not found by any of the black-box test techniques.
As a reminder, the specification and source code are repeated next.
The specification can be summarised as follows – OnlineSales.giveDiscount() returns:
FT
FULLPRICE if bonusPoints≤80 and a goldCustomer
DISCOUNT if bonusPoints>120
DISCOUNT if bonusPoints>80 and a goldCustomer
ERROR if any inputs are invalid (bonusPoints<1)
The source code with Fault 4 is repeated here for reference in Listing 5.1.
A
Listing 5.1: Fault 4
22 Status rv = FULLPRICE;
23 long threshold=120;
24
DR
25 if (bonusPoints<=0)
26 rv = ERROR;
27
28 else {
29 if (goldCustomer)
30 threshold = 80;
31 if (bonusPoints>threshold)
32 rv=DISCOUNT;
33 }
34
35 if (bonusPoints==43) // fault 4
36 rv = DISCOUNT;
37
38 return rv;
39 }
40
41 }
2 In this book, the Java/JaCoCo tool is used to measure statement coverage during test execution –
FT
Figure 5.1: DT Test Coverage Summary for giveDiscount() with Fault 4
For statement coverage we are only interested in the Lines column and the Missed
column to its left in the report. We are also only concerned with the method we are
A
testing, shown in the row giveDiscount(long,boolean). Lines shows the number of lines
in the method: 11. Missed shows the number of missed (unexecuted) lines in the method:
1. The tests have missed 1 out of 11 lines in the method, and have not achieved full
statement coverage.
DR
The next step is to determine which lines have not been executed. We do this by
examining the source code coverage report as shown in Figure 5.2. Note that this shows
the full file for OnlineSales.java, including lines that have been omitted from the source
code listings elsewhere in the book (to make it easier to read the source code by focusing
on the essential elements).
The highlighting of individual lines of code is interpreted as follows:
• Lines that have been executed during testing are highlighted in light grey (green
on the screen) and medium grey (yellow on the screen).
• Lines which have not been executed are highlighted in dark grey (red on the screen).
We are only interested in the lines in the method giveDiscount() on lines 22-41. Other
highlighted lines in the report refer to source code in the class OnlineSales which are not
part of this method. The JavaDoc specification is also included as this is part of the
source file – these lines have been left out elsewhere for simplicity.
3 The report contains other pages with which we are not concerned here.
For CS265/CS608 Students Personal Use Only
A FT
DR
• Line 37 is medium grey indicating that it has been partially executed. This counts
as executed for statement coverage, and is explored in more detail when we discuss
branch coverage in the next chapter.
• Lines 24, 25, 27, 28, 31–34, and 40 are light grey indicating that they have been
fully executed.
For CS265/CS608 Students Personal Use Only
Remember, we are testing the method giveDiscount() and not the entire class – the
coverage shows that some of the compiler generated code for the class has not been
executed. This is of relevance when we consider object-oriented testing in Chapter 9.
By examining the source code and the coverage results, we can identify the following:
the unexecuted lines of code, the conditions that must be met to cause these lines to be
executed, and the required input values to meet these conditions.
In this example, from Figure 5.2, we can identify the unexecuted statements: line 38.
By a careful examination of the source code, we can identify the condition that will
cause line 38 to execute – bonusPoints==43 must be true on line 37, which means that
bonusPoints must have the value 43 at this point in the code execution.
As bonusPoints is an input parameter, it is easy to identify the required input:
bonusPoints must have the value 43. When the condition is the result of a calculation in
the code, more detailed analysis is required. These results are summarised in Table 5.1.
5.3.2
ID
1
Each unexecuted line of source code (or statement) is a test coverage item as shown
A
in Table 5.2. Lines which cannot be executed are ignored (for example, comments and
blank lines). This shows that only one extra test coverage item is required to provide full
source code coverage (when executed along with the full set of black-box tests). There
is little value in identifying all the previously executed statements and which test covers
DR
Notes:
ID
T4.1
TCI
SC1 43
FT
bonusPoints goldCustomer
false
return value
FULLPRICE
1. Every test coverage item is covered by at least one test case (this confirms that the
tests are complete).
2. Every new test case covers additional test coverage items (this confirms that there
are no unnecessary tests).
For CS265/CS608 Students Personal Use Only
3. There are no duplicate tests (taking the equivalence partition, boundary value
analysis, and decision tables tests into consideration).
In this example, we can see from Table 5.4 that every test coverage item is covered,
and from Table 5.3 that every statement coverage test case covers additional test coverage
items. Comparing the test cases for the equivalence partition, boundary value analysis,
decision tables, and statement coverage tests (Table 2.9, 3.3, 4.15, and Table 5.3) we
can see that there are no duplicate tests.
1
2
3
4
5
public class OnlineSalesTest { FT
Listing 5.2: OnlineSalesTest for SC Testing
30 }
31
32 }
FT
PASSED: test_giveDiscount("T2.4", 120, false, FULLPRICE)
PASSED: test_giveDiscount("T2.5", 121, false, DISCOUNT)
PASSED: test_giveDiscount("T2.6", 9223372036854775807, false, DISCOUNT)
PASSED: test_giveDiscount("T2.7", -9223372036854775808, false, ERROR)
PASSED: test_giveDiscount("T2.8", 0, false, ERROR)
PASSED: test_giveDiscount("T3.1", 100, true, DISCOUNT)
PASSED: test_giveDiscount("T3.2", 200, true, DISCOUNT)
FAILED: test_giveDiscount("T4.1", 43, true, FULLPRICE)
java.lang.AssertionError: expected [FULLPRICE] but found [DISCOUNT]
===============================================
A
Command line suite
Total tests run: 15, Passes: 14, Failures: 1, Skips: 0
===============================================
DR
The statement coverage test T4.1 fails indicating a fault in the code: Fault 4 has been
found by this test. There is no value in running these tests against any other versions of
the code, unless it has been verified that the extra test continues to provide 100% code
coverage. The test data has been specifically developed to ensure statement coverage
against the code version that contains Fault 4. The test coverage results are shown in
Figure 5.4.
Full statement coverage has been achieved – there are no missed lines. There is no
value in viewing the annotated source code once 100% coverage has been achieved.
For CS265/CS608 Students Personal Use Only
5.5.2 Description
Tests are developed to cause all statements in the source code to be executed. The level
of statement coverage achieved during testing is usually shown as a percentage – for
example, 100% statement coverage means that every statement has been executed, 50%
FT
means that only half the statements have been covered.
Statement coverage is the weakest form of white-box testing5 . However, 100%
statement coverage will probably not cover all the logic of a program, or cause all the
possible output values to be generated!
An additional benefit of statement coverage is that unreachable code is identified
during the analysis stage (this code should be reviewed, and probably deleted).
understanding the code flow at a more abstract level, but these are seldom required for
statement coverage. CFGs are explained in Chapter 7. Statement coverage tests may
also be developed before black-box tests, though this is not usual practice. In this case,
CFGs are traditionally used to assist in developing the tests, though the experienced
tester will probably not need to use them.
to execute. These data values can be identified during analysis, as shown in the example,
or identified during development of the test cases.
Finally, using the specification, and never the source code, work out the expected
results for each set of input values. It is worth repeating here that the expected results
must come from the specification. It is all too easy to make the mistake of reading the
expected results from the code – and the tester must be careful to avoid doing this.
5.6 Evaluation
Statement coverage has uncovered Fault 4 inserted into the method giveDiscount().
5.6.1 Limitations
This chapter has already shown how statement coverage detects faults related to
unexecuted statements, using Fault 4. Some benefits and limitations of Statement
Fault 5
FT
Coverage are now explored by inserting a new fault into the original code.
A new fault is now inserted on line 31 to demonstrate some limitations of testing with
statement coverage. This fault is produced by modifying the if statement on line 31,
incorrectly adding “&& bonusPoints!=93” . This creates an extra branch in the code,
one which is not taken with the existing test data (as the value 93 is never used for
A
bonusPoints). The result is that when bonusPoints is equal to 93, line 32 will not be
executed. This is shown in Listing 5.3.
FT
PASSED: test_giveDiscount("T2.7", -9223372036854775808, false, ERROR)
PASSED: test_giveDiscount("T2.8", 0, false, ERROR)
PASSED: test_giveDiscount("T3.1", 100, true, DISCOUNT)
PASSED: test_giveDiscount("T3.2", 200, true, DISCOUNT)
PASSED: test_giveDiscount("T4.1", 43, true, FULLPRICE)
===============================================
Command line suite
Total tests run: 15, Passes: 15, Failures: 0, Skips: 0
===============================================
A
Figure 5.5: SC Test Results for giveDiscount() with Fault 5
$ check 93 true
FULLPRICE
The wrong result is returned for the inputs (93,true). The correct result is
DISCOUNT. We will need more sophisticated methods to find such faults, which are
discussed in Chapter 6.
Weaknesses
FT
not been verified, and may well be faulty.
Statement coverage can generally be achieved using only a small number of extra
tests.
8 A decision is a boolean expression that is used to alter the flow of control in the code. These are used,
for example, in if and while statements. Each boolean sub-clause in a compound boolean expression
is referred to as a boolean condition. In this example, (a>1) and (b==0) are boolean conditions in
the decision ((a>1)||(b==0)). Simple decisions have a single boolean condition, as shown above where
(number<3) is both a decision and also the single boolean condition in that decision.
For CS265/CS608 Students Personal Use Only
FT
An experienced tester will perform black-box tests first and measure their coverage.
Then by reviewing the coverage results additional tests can be developed to ensure
full statement coverage. Working out the correct input values can be complex, and
the experienced tester will often use the debugger to assist in doing this. One useful
technique for complex decisions is to set a breakpoint at the line of code directly before
the first unexecuted line, and then examine the value of the relevant variables. An
experienced tester will probably develop the statement coverage test cases directly from
the coverage results, without documenting the analysis or test coverage items. Unlike
A
in black-box testing, the test design work can be effectively reviewed by examining the
coverage statistics generated by the test, without access to this documentation.
DR
For CS265/CS608 Students Personal Use Only
Chapter 6
Branch Coverage
In this chapter, the next strongest form of white-box testing, branch coverage (BC),
is introduced.
As for statement coverage, branch coverage can also be measured automatically for Java
programs. The JaCoCo tool, as previously used for statement coverage, can also be used
to measure branch coverage. Note that different tools may measure branches in different
ways – in this chapter we use the JaCoCo branch report as the basis for testing1 .
6.2 Example
Testing of the OnlineSales.giveDiscount() method with Fault 5, shown in Listing 6.1, is
used in this chapter. As a reminder, the specification and code are shown again next.
To summarise the specification, giveDiscount(bonusPoints,goldCustomer) returns:
1 JaCoCo measures the outcome of each boolean condition in a decision as a separate branch. Other
tools may take a different approach, seen in many textbooks, of measuring just the outcomes of the
decision itself as branches, and not of the individual boolean conditions within the decision.
98
For CS265/CS608 Students Personal Use Only
The source code with Fault 5 is repeated for easy reference in Listing 6.1.
return rv;
A FT
6.2.1 Analysis: Identifying Untaken Branches
For code that has already been tested with black-box and/or statement coverage, tests to
develop full branch coverage can be developed by examining the branch coverage results
DR
of these tests.
For Fault 5, running the existing tests (equivalence partition, boundary value analysis,
decision tables, statement coverage) with code coverage measurement enabled produces
a coverage summary report, and highlighted source code. These are the same reports we
saw in the previous chapter (Section 5.3.1) for statement coverage, except that now we
will be examining different figures in the report.
The coverage summary for these tests is shown in Figure 6.1
It is important to note that the statement coverage tests were developed to provide
full statement coverage of giveDiscount() with Fault 4, and not with Fault 5. The tests
are therefore not guaranteed to provide statement coverage for the version with Fault 5.
For CS265/CS608 Students Personal Use Only
However, in this example, we can see from the coverage summary that 100% statement
coverage has been achieved (Lines Missed is 0).
Examining the Missed Branches reveals that we only have 87% coverage – this means
that not all of the branches in the method have been taken. In order to identify which
branches have not been taken, we now examine the source code coverage report (see
Figure 6.2).
A FT
DR
Line 31 is marked in medium grey (or yellow when viewed on the screen) indicating
that it has not been fully executed. That means that some of the branches inside this
line of source code have not been taken.
For CS265/CS608 Students Personal Use Only
Hovering the cursor over the diamond on line 31 causes a detailed report of the branch
coverage to appear in a popup box below the line. This is shown in Figure 6.3.
The report 1 of 4 branches missed. indicates that the line contains 4 branches, and
that 1 of these branches has not been taken. Recall that JaCoCo is typical of Java branch
coverage tools in that it counts branches for the outcome of each boolean condition2 and
not for the decision itself.
FT
We can now examine the branch coverage report in more detail. There is one decision
on line 31 – this is a compound decision:
• Decision 1: goldCustomer && bonusPoints!=93
There are two boolean conditions in the decision on line 31:
• Condition 1: goldCustomer
A
• Condition 2: bonusPoints!=93
There are 4 branches in the code, two for each boolean condition: one for the true
outcome, and one for the false outcome:
DR
2 The word condition has multiple meanings in software testing, so for clarity we use test condition
to mean an IEEE Test Condition, and boolean condition to refer to a sub-clause in a complex decision.
For CS265/CS608 Students Personal Use Only
Test cases are developed to ensure that all the previously untaken branches are taken
A
during execution of the branch coverage tests. Experience allows a minimum set of tests
to be developed – but the emphasis is on ensuring full branch coverage.
The analysis has identified the conditions required for each untaken branch to be
taken, and this assists in developing the test data.
DR
In this example as we have already identified, to execute the untaken branch the
parameter goldCustomer must be true and the parameter bonusPoints must have the
value 93. The expected results are determined from the specification. The input values
(93,true) should produce DISCOUNT.
6.3
6.3.1
Test Implementation and Results
Implementation FT
The full test implementation, adding the branch coverage tests to the previously
developed equivalence partition, boundary value analysis, decision tables and statement
coverage tests, is shown in Figure 6.2. This implementation represents the full set of
tests required to effect branch coverage3 .
A
Listing 6.2: OnlineSalesTest for BC Testing
1 public class OnlineSalesTest {
2
DR
3 Techniques for grouping individual tests into sets of tests/test suites, which can then be run
23 @DataProvider(name="testset1")
24 public Object[][] getTestData() {
25 return testData1;
26 }
27
28 @Test(dataProvider="testset1")
29 public void test_giveDiscount( String id, long bonusPoints, boolean
goldCustomer, Status expected) {
30 assertEquals( OnlineSales.giveDiscount(bonusPoints,
goldCustomer), expected );
31 }
32
33 }
FT
The results of running the full set of tests (equivalence partition, boundary value analysis,
decision tables, statement coverage and branch coverage) against Fault 5 is shown in
Figure 6.4.
Test T5.1 fails indicating that the fault in the code (Fault 5) has been found by the
full branch coverage tests. The test coverage results are shown in Figure 6.5.
For CS265/CS608 Students Personal Use Only
Full branch coverage has been achieved – there are no missed branches. The source
code report identifies that all the branches have been taken, as shown in Figure 6.6 where
all the fully executed lines are marked in light-grey (green). In the following chapters we
will not show the detailed source code coverage report when the summary report shows
full coverage.
A FT
DR
For CS265/CS608 Students Personal Use Only
A FT
DR
6.4.2 Description
As with many forms of testing, there is more than one approach. In this book we use a
tool to measure the branch coverage of the previously developed tests, and only develop
new tests to complete the branch coverage. This is the approach most usually used in
practice.
However, there are alternative approaches:
• As we have seen, JaCoCo counts the outcome of each boolean condition as a branch.
The tester might select a different tool which counts the outcomes of each decision
as branches instead. This would reduce the number of branches, which leads to
slightly reduced test effectiveness. However, the tester must work with the tools
that are available to him. See the discussion on Decision Coverage and Condition
Coverage in Sections 8.3.2 and 8.3.3.
FT
• The branches in the code may be identified from scratch without considering any
of the previously written tests. This involves developing a control flow graph4 of
the program, and then identifying all the edges in the graph as branches. In this
approach, the decisions themselves (and not the boolean expressions) are invariably
used. Many books demonstrate this approach, but it is seldom used in practice for
two reasons. Firstly, it is very time consuming to develop the control flow graph
for code of any significant length. And secondly, if the code is changed, either to
A
fix a fault or add new features, the control flow graph will have to be reviewed and
possibly re-done. And the associated test implementation redeveloped. The rapid
change of code in a modern, Agile development environment makes this approach
less realistic than the approach shown in this book.
DR
6.4.3 Goal
The goal of branch coverage is to make sure that every branch in the source code has
been taken during testing. The ideal test completion criteria is 100% branch coverage.
Note that a branch is based on the evaluation of a boolean expression, which can
evaluate to true or false. A decision may be simple or compound. A simple decision
contains a single boolean expression, or boolean condition, with no boolean operators. A
compound decision contains multiple boolean conditions connected by boolean operators.
An example of each is shown in Snippet 6.1.
FT
• From line 3 to line 4 when x is less than 0
• From line 3 to line 5 when x is not less than 0
evaluation5 )
• The false outcome of (x>100): branch to the next boolean condition (x>50)
• The true outcome of (x>50): branch to the next boolean condition (special)
• The false outcome of (x>50): branch from line 5 to line 7 (short-circuit evaluation)
• The true outcome of (special): branch from line 5 to line 6
• The false outcome of (special): branch from line 5 to line 7
5 Short-circuit or lazy evaluation occurs when the evaluation of one boolean condition means that
subsequent boolean conditions do not need to be evaluated – the result can be short-circuited.
For CS265/CS608 Students Personal Use Only
6.5 Evaluation
Branch coverage has uncovered Fault 5 inserted into the method giveDiscount().
6.5.1 Limitations
Some benefits and limitations of branch coverage are now explored by making changes
to inject faults into the original (correct) source code.
Fault 6
FT
A new fault is now inserted to demonstrate some limitations of branch coverage testing.
The entire processing of the method is rewritten, as shown in Lines 24 to 38 in Listing 6.3).
This creates a path through the code that is not taken with any of the existing branch
coverage test data.
A
Listing 6.3: Fault 6
22 public static Status giveDiscount(long bonusPoints, boolean
goldCustomer)
23 {
DR
24 Status rv = ERROR;
25 long threshold=goldCustomer?80:120;
26 long thresholdJump=goldCustomer?20:30;
27
28 if (bonusPoints>0) {
29 if (bonusPoints<thresholdJump)
30 bonusPoints -= threshold;
31 if (bonusPoints>thresholdJump)
32 bonusPoints -= threshold;
33 bonusPoints += 4*(thresholdJump);
34 if (bonusPoints>threshold)
35 rv = DISCOUNT;
36 else
37 rv = FULLPRICE;
38 }
39
40 return rv;
41 }
For CS265/CS608 Students Personal Use Only
FT
PASSED: test_giveDiscount("T3.1", 100, true, DISCOUNT)
PASSED: test_giveDiscount("T3.2", 200, true, DISCOUNT)
PASSED: test_giveDiscount("T4.1", 43, true, FULLPRICE)
PASSED: test_giveDiscount("T5.1", 93, true, DISCOUNT)
===============================================
Command line suite
Total tests run: 16, Passes: 16, Failures: 0, Skips: 0
===============================================
A
Figure 6.7: BC Test Results for giveDiscount() with Fault6
All the tests have passed – the fault has not been found. This is to be expected as (a)
the inserted fault bears no relationship to the specification and is unlikely to be found
DR
by any black-box test technique, and (b) the fault is not revealed by achieving either
statement coverage or branch coverage of the code.
The tests provide full statement and branch coverage, as shown in Figure 6.8.
Tests T4.1 and T5.1 are in fact redundant for this version of the code – the code
has been changed, and these tests are no longer required to achieve statement or branch
coverage.
$ check 20 true
DISCOUNT
$ check 30 false
DISCOUNT
Note that the wrong result is returned for both the inputs (20,true) and (30,false).
The correct result is FULLPRICE in both cases.
Strengths
FT
not exercise either all the reasons for taking each branch, or combinations of different
branches taken.
Resolves the NULL else problem – branch coverage testing makes sure that both
the true and false outcomes are covered in testing.
A
Weaknesses
Can be difficult to determine the required input parameter values.
If the tool only counts decisions as branches, or if a control flow graph has been
manually developed, then it is undemanding of compound decisions. In these
DR
cases it does not explore all the different reasons (i.e. the boolean conditions)
for the decision evaluating as true or false.
determine the values needed to cause each branch to execute, will usually be done in the
testers mind. Perhaps a comment will also be added to the test code to identify branches
that cannot be taken. As discussed previously, if an audit trail is not maintained by the
tester, then the design process cannot be easily reviewed.
In advanced testing, exception coverage can be regarded as a form of branch coverage.
Each exception raised and caught is regarded as a branch.
As with all white-box testing, tests that achieve full branch coverage will often become
outdated by changes to the code. The experienced tester may, however, leave these tests
in the code as extra tests.
Developing tests to achieve full branch coverage is practically impossible in code of
any significant size. The experienced tester will focus these tests just on the critical code.
A FT
DR
For CS265/CS608 Students Personal Use Only
Chapter 7
This chapter introduces all paths coverage testing (AP), which is the strongest form
FT
of white-box testing based on the program structure1 . All the paths, from entry to exit
of a block of code (usually a method; the test item), are executed during testing.
Developing all paths tests is complex and time consuming. In practice it is seldom
used, it would only be considered for critical software. However, as the strongest form of
testing based on program-structure it is of theoretical importance, and is an important
element in the body of knowledge for software testing.
In statement and branch testing, Control Flow Graphs (CFGs) are seldom used in
practice. Tests can be more efficiently developed to supplement black-box tests as shown
A
in the previous chapters, using automated tools to measure the coverage. However, in
all paths testing, CFGs are essential. This chapter provides an understanding of how to
develop and use them.
DR
1 If all paths coverage has been achieved, then coverage has also been achieved for all other forms,
including statement and branch coverage. Both are discussed in previous chapters.
113
For CS265/CS608 Students Personal Use Only
7.2 Example
The source code for giveDiscount with Fault 6, which was not found by the previous test
techniques, is repeated for reference in Listing 7.1.
rv = DISCOUNT;
else
rv = FULLPRICE;
FT
if (bonusPoints>thresholdJump)
bonusPoints += 4*(thresholdJump);
if (bonusPoints>threshold)
39
40 return rv;
41 }
DR
The method contains paths through the code not taken by the tests to date. One of
these paths is faulty, and it takes a systematic approach to identify and test every path.
1. The CFG starts with the entry point (line 22). From here, the code always executes
to line 28. Lines 22 to 28 form a block of indivisible statements2 , so this is
represented by a single node, labelled (22..28). See Figure 7.1. It is recommended
that you label the nodes with the line numbers they contain – it is much more
informative than just numbering the nodes 1, 2, 3, · · · , etc. Remember that each
node must represent an indivisible block of code (containing no branches).
FT
Figure 7.1: CFG Stage 0
2. On line 28, if the decision (bonusPoints>0) is true, then control branches to line
29. If not, the control branches to line 38. This is shown in Figure 7.2. A quick
inspection of the code shows that there are no jumps between lines 38 and 41, so
this node is labelled 38..41. The expression bonusPoints>0 shows the expression
that must be true to take the right-hand branch.
A
DR
A FT
Figure 7.5: CFG Stage 4
6. After line 37, control always passes to line 38. This is also the case for line 35. See
DR
FT
8. Path 8, nodes (22..28)–(29)–(30)–(31)–(32)–(33..34)–(36..37)–(38..41)
not possible
9. Path 9, nodes (22..28)–(29)–(30)–(31)–(32)–(33..34)–(35)–(38..41)
not possible
3. Where no previous test causes a path to be taken, criteria or values for the new
test data
These results are summarised in Table 7.1.
ID
T6.1
7.2.4
Test Cases Covered
AP1 20
2
3 private static Object[][] testData1 = new Object[][] {
4 // test, bonusPoints, goldCustomer, expected output
5 { "T1.1", 40L, true, FULLPRICE },
6 { "T1.2", 100L, false, FULLPRICE },
7 { "T1.3", 200L, false, DISCOUNT },
8 { "T1.4", -100L, false, ERROR },
9 { "T2.1", 1L, true, FULLPRICE },
10 { "T2.2", 80L, false, FULLPRICE },
11 { "T2.3", 81L, false, FULLPRICE },
12 { "T2.4", 120L, false, FULLPRICE },
13 { "T2.5", 121L, false, DISCOUNT },
14 { "T2.6", Long.MAX_VALUE, false, DISCOUNT },
15 { "T2.7", Long.MIN_VALUE, false, ERROR },
16 { "T2.8", 0L, false, ERROR },
17 { "T3.1", 100L, true, DISCOUNT },
18 { "T3.2", 200L, true, DISCOUNT },
19
20
21
22
23
24
25
26
};
{ "T4.1",
{ "T5.1",
{ "T6.1",
43L,
93L,
20L,
true,
true,
true,
@DataProvider(name="testset1")
public Object[][] getTestData() {
return testData1;
A FT
FULLPRICE },
DISCOUNT },
FULLPRICE },
27 }
28
29 @Test(dataProvider="testset1")
30 public void test_giveDiscount( String id, long bonusPoints, boolean
goldCustomer, Status expected) {
DR
31 assertEquals( OnlineSales.giveDiscount(bonusPoints,
goldCustomer), expected );
32 }
33
34 }
FT
Total tests run: 17, Passes: 16, Failures: 1, Skips: 0
===============================================
The fault is found by the new test: T6.1. Note that the previous white-box tests
(T4.1, T5.1) have been invalidated by changing the code – there is no guarantee that the
A
statement coverage and branch coverage tests developed for Fault 5 provide full coverage
for Fault 6. However, the coverage tool confirms that full statement coverage and branch
coverage have still been achieved by these tests. Interestingly, tests T4.1 and T5.1 are
redundant, as the equivalence partition, boundary value analysis, and decision tables
DR
tests also happen to achieve full statement coverage and branch coverage coverage for
the version of the code with Fault 6.
FT
Total tests run: 17, Passes: 17, Failures: 0, Skips: 0
===============================================
All the tests have passed. The coverage results are shown in Figure 7.10.
A
DR
These confirm that full statement coverage and branch coverage coverage has been
achieved. A manual analysis would be required to confirm that the four possible end-to-
end paths have also been executed, providing 100% all-paths coverage3 .
7.4.2 Description
All paths testing causes every possible path from entry to exit of the program to be
taken during test execution. By inspection of the code, input data is selected to cause
execution of every path. Expected results are derived from the specification.
Sequence
A FT
DR
In Figure 7.11, the code in lines 1–7 is always executed as a sequence, as there are no
branches, so this sequence of code is represented as a single node. The node is referenced
by its identifier (1..7).
There is no standard convention for including braces “{” and “}” and other non-
executable source code notation, except to be consistent. The benefit of always including
the braces is that it is much easier to review CFGs and ensure that no lines have been
left out.
4 Most text books recommend numbering the nodes sequentially and placing the line numbers outside
the node. This makes using the CFGs more difficult, requiring frequent references between different
diagrams. In this book, we use the line number(s) as the node identifier.
For CS265/CS608 Students Personal Use Only
Selection/If-Then
In Figure 7.12, considering the flow of execution in the code, lines 1 to 4 are executed
as a sequence. If the decision on line 4 is true, then line 5 is executed next. After line
execute.
FT
5, lines 6 and then 7 execute. If the decision on line 4 is false, then lines 6 and then 7
Considering the CFG, node (1..4) executes first. If the decision (a>10) at the end
of node (1..4) is true, then control moves to node (5), and then to node (6..7). If the
decision at the end of node (1..4) is false, then control moves to node (6..7).
This demonstrates how each node in the CFG represents a sequence of one or more
lines of code, and how the edges in the CFG reflect the decisions and subsequent branches
A
in the code. Be consistent: always keep the false branch on the left, and the true branch
on the right.
If the decisions are included next to the true branch, then the CFGs can be used for
subsequent analysis without the need to refer back to the source code.
DR
The edges may be annotated, using different symbols to prevent any confusion with
either the node identifiers or the decisions. Here the greek characters α, β, and γ are
used. The edges are not used for any of the techniques included in this book, and will
be omitted from future CFG diagrams.
The edge β from node (1..4) to node (6..7) is referred to as a qEmphasisnull-else
statement. This branch is taken if the decision is false, but contains no executable code
(hence null).
For CS265/CS608 Students Personal Use Only
Selection/If-Then-Else
node (1..4).
If the decision (a>10) in node (1..4) evaluates to true, then node (5) is executed, and
If the decision in node (1..4) evaluates to false, then node (6..7) is executed, and
control moves to node (8..9).
A
DR
Selection/Switch
In Figure 7.14, a previous block of code ending on line 1 is executed as a sequence, node
(1). Depending on the value of a, control then jumps to either node (2..4), node (5..7),
or node (8..10).
The lines in node (2..4) execute as a sequence, and control moves to node (11).
The lines in node (5..7) execute as a sequence, and control moves to node (11).
For CS265/CS608 Students Personal Use Only
The lines in node (8..10) execute as a sequence, and control moves to node (11).
Iteration/While
FT
In Figure 7.15, the code in lines 1–3 is always executed as a sequence – node (1..3). The
while statement must be in its own node as the decision is evaluated every time through
the loop; compare this with the if statement in node (1..4) in Figure 7.12.
If the decision (x>10) in node (4) evaluates to true, then node (5) is executed, and
then control returns to node (4), where the decision is re-evaluated.
If the decision in node (4) evaluates to false, on the first or any subsequent iteration,
then the loop exits, and control jumps to node (6..7).
A
Iteration/Do-While
DR
In Figure 7.16, the code in lines 1–3 is always executed as a sequence, node (1..3).
The body of the do-while loop is then executed, node (4..5).
If the decision (x> 10) in node (6) evaluates to true, then control jumps back to node
(4..5).
If the decision in node (6) evaluates to false, then control jumps to node (7..8).
Note the difference in structure from the previous example. The decision is evaluated
after each execution of the loop, not before.
For CS265/CS608 Students Personal Use Only
Iteration/For
In Figure 7.17, line 4 contains three separate parts, and must be divided into three
sub-lines as follows:
4a int i=0;
4b i¡a
4c i++
FT
The code in lines 1–3, followed by the first part of the for loop (line 4a), is always
executed as a sequence, node (1..4a).
The decision (i<a) in the for-loop, in node (4b), is then evaluated. If true, then the
A
body of the for-loop, on node (5), is then executed, followed by the for-loop increment
in node (4c). Control then returns to re-evaluating the decision in node (4a).
When the decision in node (4b) evaluates to false, then node (6) is executed next.
DR
In Figure 7.18, the code on lines 1 to 4 are always executed as a sequence. This
means that the multiple statements on line 3 can be ignored from a CFG viewpoint, as
shown in the CFG which represents this sequence as a single node (1..4).
For CS265/CS608 Students Personal Use Only
In Figure 7.19, the code on line 4 contains two statements which do not form an
indivisible sequence, and must be divided into two sub-lines as follows:
4a if (a¿10)
4b x=a;
(6..7).
FT
Lines 1–4a are executed as a sequence, node (1..4a). Then if the decision (a>10) on
line 4a/node (4a) is true, line 4b/node (4b) is executed, followed by lines 6 and 7, node
Then if the decision on line 4a/node (4a) is false, then control passes directly to lines
6 and 7, node (6..7).
A
7.4.4 Analysis: Identifying End-to-End Paths
Instead of considering a program as being constructed from the individual statements
(shown connected by branches in a CFG), a program can also be considered as being
DR
There are a very large number of possible end-to-end paths through this code:
1. Nodes: (1..3)→(4)→(6..7)
2. Nodes: (1..3)→(4)→(5)→(4)→(6..7)
3. Nodes: (1..3)→(4)→(5)→(4)→(5)→(4)→(6..7)
4. ···
To reduce the large number of different paths where loops are involved, equivalence
classes of paths are selected. For the purposes of testing, two paths are considered
For CS265/CS608 Students Personal Use Only
equivalent if they differ only in the number of loop iterations. This gives two classes of
loops:
• ones with 0 iterations.
• ones with n iterations (n > 0).
The number of start-to-finish paths, and the nodes on each path, can be calculated
using the technique of Basis Paths as described in the following section.
Basis Paths
The flow of control through a CFG can be represented by a regular expression. The
expression is then evaluated to give the number of end-to-end paths. Loops can be
executed an indefinite number of times: this is resolved by treating them as code segments
that can be executed exactly zero or one times.
There are three operators defined for the regular expression:
. concatenation – this represents a sequence of nodes in the graph
A
1
2
i=0;
while (i<list.length) {
FT
+ selection – this represents a decision in the graph (e.g. the if statement).
()∗ iteration – this represents a repetition in the graph (e.g. the while statement).
Consider the following example:
3 if (list[i]==target)
4 match++;
5 else
6 mismatch++;
DR
7 i++;
8 }
This is represented by the CFG in Figure 7.22.
This CFG can be represented by the following regular expression, where the numbers
are the nodes shown in Figure 7.22, using the operators described before:
For CS265/CS608 Students Personal Use Only
1 · 2 · (3 · (4 + (5..6)) · 7 · 2)∗ · 8
The loops are simplified by replacing (expression)∗ with (expression+0) representing
the two equivalence paths, one with a single iteration through the loop expression, and
one with no iterations, where 0 represents the null statement. This gives:
1 · 2 · ((3 · (4 + (5..6)) · 7 · 2) + 0) · 8
This can be described as node 1 followed by 2 followed by either 3 followed by either
4 or 5..6 followed by 7 followed by 2, or null, followed by 8. Expanding this, where the
+ symbol represents alternatives, gives the following paths:
Alternative 1. 1–2–8
Alternative 2. 1–2–3–4–7–2–8
Alternative 3. 1–2–3–(5..6)–7–2–8
FT
By replacing each of the node numbers (and nulls) by the value 1, and then evaluating
the expression as a mathematical formula (using + for addition, and · for multiplication),
the number of paths can be calculated as follows:
paths = 1 · 1 · ((1 · (1 + 1) · 1 · 1) + 1) · 1 = 3
In practice the paths can often be identified by hand without using this technique.
Note: for null-else statements, where there is an if with no else, the expression (n+0)
A
is used. The node n represents the true decision, and the node 0 represents the null else
decision.
The possible paths are identified by mentally executing each path in the final CFG
(Figure 7.6), following the code in parallel to understand the processing, and identifying
the input data constraints at each decision point required to follow the selected path.
If conflicting constraints are found, then that path is impossible. Otherwise, input
data can be selected that matches the constraints. If possible, data values, and
combinations of values, from previous tests are used (refer to Listing 3.1), so that
duplicate tests can be easily identified and eliminated.
• Path 1
Nodes: (22..28)→(38..41): note these must be in the correct sequence
– Branch (22..29)→(38..41): requires (bonusPoints>0) to be false
Therefore, bonusPoints must be zero or negative.
Test T1.4 meets this condition with the input values (-100, false)
• Path 2
Nodes: (22..28)→(29)→(31)→(33..34)→(36..37)→(38..41)
– Branch (22..29)→(29): requires (bonusPoints>0)
– Branch (29)→(31): requires (bonusPoints<thresholdJump) to be false
For CS265/CS608 Students Personal Use Only
The input bonusPoints must be greater than or equal to zero on entry. At node
(29) it must be greater than or equal to thresholdJump (20 or 30, depending on
goldCustomer).
At node (31) it must be less than or equal to threholdJump. This implies that
bonusPoints must be equal to threholdJump.
On node (33) the value is increased by 4*thresholdJump, that is by 80 or 120.
So if the inputs are goldCustomer is true and bonusPoints equals 20, then on node
(34) bonusPoints is modified to 100, which is greater than the threshold (80).
And if the inputs are goldCustomer is false and bonusPoints equals 30, then on node
(34) bonusPoints is modified to 150, which is greater than the threshold (120).
In both cases, the decision at node (34) will be true. But the required value for
–
–
–
A
Branch
Branch
Branch
Branch
FT
this path is false. Thus, the path is impossible.
• Path 3 Nodes: (22..28)→(29)→(31)→(33..34)→(35)→(38..41)
– (22..29)→(29): requires (bonusPoints>0)
(29)→(31): requires (bonusPoints<thresholdJump) to be false
(31)→(33..34): requires (bonusPoints>thresholdJump) to be false
(33..34)→(35): requires (bonusPoints>threshold)
The input bonusPoints must be greater than or equal to zero on entry. At node
(29) it must be greater than or equal to thresholdJump (20 or 30, depending on
goldCustomer).
DR
At node (31) it must be less than or equal to threholdJump. This implies that
bonusPoints must be equal to threholdJump.
On node (33) the value is increased by 4*thresholdJump, that is by 80 or 120.
So if the inputs are goldCustomer is true and bonusPoints equals 20, then on node
(34) bonusPoints is modified to 100, which is greater than the threshold (80).
And if the inputs are goldCustomer is false and bonusPoints equals 30, then on node
(34) bonusPoints is modified to 150, which is greater than the threshold (120).
In both cases, the decision at node (34) will be true as required. None of the
existsing tests have either 20 or 30 as input values for bonusPoints, so a new test
is required. The input data may be either (20,true) or (30,false).
• Path 4 Nodes: (22..28)→(29)→(31)→(32)→(33..34)→(36..37)→(38..41)
The input bonusPoints must be greater than or equal to zero on entry. At node
(29) it must be greater than or equal to thresholdJump (20 or 30, depending on
goldCustomer).
At node (31), bonusPoints must be greater than thresholdJump (20 or 30).
Therefore, bonusPoints must be greater than 0, greater or equal to than threshold-
Jump (20 or 30) at node(29), and greater than thresholdJump at node(31).
At node (32), the value of bonusPoints is reduced by thresholdJump, and at node
(33..34) this value is increased by 4*thresholdJump, that is by 80 or 120.
So if the inputs are goldCustomer is true and bonusPoints is greater than 20, then
on node (30), bonuspoints is reduced by 20, and at (33..34) bonusPoints increased
by 80, giving an overall change of +60 to the value. The value must be less than
or equal to 80 here, so the minimum value of bonusPoints is 1, and the maximum
is 20.
And if the inputs are goldCustomer is false and bonusPoints equals 30, then on
30. FT
node (30) bonusPoints is reduced by 30, and at (33..34) bonuspoints increased by
120, giving an overall change of +90 to the value. The value must be less than or
equal to 120 here, so the minimum value of bonusPoints is 1, and the maximum is
Test T2.1 matches these constraints, with the input values (1,true).
greater 120 here, so the minimum value of bonusPoints is 31, and the maximum
again large (Long.MAX VALUE-90).
Test T1.1 matches these constraints, with the input values (40,true).
• Path 6 Nodes: (22..28)→(29)→(30)→(31)→(33..34)→(36..37)→(38..41)
– Branch (22..28)→(29): requires (bonusPoints>0)
– Branch (29)→(30): requires (bonusPoints<thresholdJump) – note node (30)
modifies bonusPoints
– Branch (31)→(33..34): requires (modified bonusPoints>thresholdJump) to be
false
– Branch (33..34)→(36..37): requires (modified bonusPoints>threshold) to be
false
The input bonusPoints must be greater than or equal to zero on entry. At node
(29) it must be less than thresholdJump (20 or 30, depending on goldCustomer).
FT
At node (31), bonusPoints must be less than or equal to thresholdJump (20 or 30).
Therefore, bonusPoints must be greater than 0, less thresholdJump (20 or 30) at
node(29), and less than or equal to thresholdJump at node(31).
At node (30), the value of bonusPoints is reduced by thresholdJump, and at node
(33..34) this value is increased by 4*thresholdJump, that is by 80 or 120.
So if the inputs are goldCustomer is true and bonusPoints less than 20, then on
node (30), bonuspoints is reduced by 20, and at (34) bonusPoints increased by
A
80, giving an overall change of +60 to the value at node (35). The value must be
greater than 80 here, but the maximum value it can have is 59, so this is impossible.
So if the inputs are goldCustomer is false and bonusPoints less than 30, then on
DR
node (30), bonuspoints is reduced by 30, and at (34) bonusPoints increased by 120,
giving an overall change of +90 to the value at node (35). The value must be
greater than 120 here, but the maximum value it can have is 119, so this is also
impossible.
• Path 7 Nodes: (22..28)→(29)→(30)→(31)→(33..34)→(35)→(38..41)
– Branch (22..28)→(29): requires (bonusPoints>0)
– Branch (29)→(30): requires (bonusPoints<thresholdJump) – note node (30)
modifies bonusPoints
– Branch (31)→(33..34): requires (modified bonusPoints>thresholdJump) to be
false
– Branch (33..34)→ (35): requires (modified bonusPoints>threshold)
For any value of bonusPoints, if it is less than thresholdJump (20 or 30) at node(29),
then it will always be less than threshold (80 or 120) at node(33..34), so this path
is impossible.
• Path 8 Nodes: (22..28)→(29)→(30)→(31)→(32)→(33..34)→(36..37)→(38..41)
– Branch (22..28)→(29): requires (bonusPoints>0)
– Branch (29)→(30): requires (bonusPoints<thresholdJump) – note node (30)
modifies bonusPoints
For CS265/CS608 Students Personal Use Only
7.4.6
FT
As for Path 8, for any value of bonusPoints, it is not possible to have bonusPoints
less than thresholdJump at node 30, followed by the modified (reduced) value of
bonusPoints greater than threasholdJump at node 31. So this path is impossible.
7.5 Evaluation
Achieving full all paths coverage can be a very tedious exercise. Note that even though
every path has been executed, every reason for taking every decision has not.
For CS265/CS608 Students Personal Use Only
7.5.1 Limitations
The source code shown in Listing 7.4 contains three types of fault not found (except
by chance) by the standard black-box and white-box unit test techniques. The basic
algorithm is quite different to that used previously: it uses a lookup table, and searches
for a matching entry to determine the return value.
1024L,
Status rv = ERROR;
FT
120L, false, FULLPRICE },
121L, Long.MAX_VALUE, false, DISCOUNT },
1024L, true, FULLPRICE },//Fault 7
)))
43 rv = (Status)row[3];
44
45 bonusPoints = 1/(bonusPoints-55); // Fault 9
46
47 return rv;
48 }
It is much more difficult to insert faults that can evade all paths testing, and the
code has been completely rewritten to achieve these. The inserted faults are explained
as follows:
• Fault 7: lines 26-33 contain a faulty lookup table. Line 31 will cause a failure for
the input value 1024.
The performance of many algorithms can be improved by the use of lookup tables,
but the risk is that the tables are incorrect.
• Fault 8: line 37 shows a faulty bitwise manipulation. Note that the constant
0xFFFFFFFFFFFFFEFFL is a long hexadecimal value, with an E where there
should be an F three hex digits from the end. This mistake results in a single bit
For CS265/CS608 Students Personal Use Only
being 0. This will cause a failure for the input value 256, as clearing this bit will
change the value to zero. There is no reason to do a bit manipulation in this code,
but networking and graphics code are examples where this may be required.
• Fault 9: line 45 shows a divide by zero fault. This will cause an (unhandled) Java
exception to be raised for the input value 55. As it is unhandled, it will cause the
program to crash, and by default Java produces a stack dump. Dead code like
this, that manipulates local variables which are never used again, can sometimes
be identified and removed by the compiler – but in this case the code executes and
causes a crash with the input value 55.
Division is frequently used in algorithms, and if divide-by-zero exceptions are not
handled correctly, they will cause software failures.
The results of running the all-paths tests against these faults in shown in Figure 7.23.
FT
PASSED: test_giveDiscount("T1.2", 100, false, FULLPRICE)
PASSED: test_giveDiscount("T1.3", 200, false, DISCOUNT)
PASSED: test_giveDiscount("T1.4", -100, false, ERROR)
PASSED: test_giveDiscount("T2.1", 1, true, FULLPRICE)
PASSED: test_giveDiscount("T2.2", 80, false, FULLPRICE)
PASSED: test_giveDiscount("T2.3", 81, false, FULLPRICE)
PASSED: test_giveDiscount("T2.4", 120, false, FULLPRICE)
PASSED: test_giveDiscount("T2.5", 121, false, DISCOUNT)
PASSED: test_giveDiscount("T2.6", 9223372036854775807, false, DISCOUNT)
A
PASSED: test_giveDiscount("T2.7", -9223372036854775808, false, ERROR)
PASSED: test_giveDiscount("T2.8", 0, false, ERROR)
PASSED: test_giveDiscount("T3.1", 100, true, DISCOUNT)
PASSED: test_giveDiscount("T3.2", 200, true, DISCOUNT)
PASSED: test_giveDiscount("T4.1", 43, true, FULLPRICE)
DR
The tests all pass – they do not find any of these faults, even though all-paths is
regarded as the strongest form of white-box testing (based on the program structure6 ).
6 Suggested reading for other forms of white-box testing is provided in Chapter 14.
For CS265/CS608 Students Personal Use Only
Note that the wrong result is returned for all the inputs.
• The input (256,true) should return DISCOUNT.
• The input (1024,true) should also return DISCOUNT.
• The input (55,true) should return FULLPRICE, and not raise an exception
that data.
All paths testing matches the flow of control through a program, but can be difficult to
A
realize for complex programs, in particular as each path has to be carefully analysed to
determine if it is executable, and if so to develop the criteria that cause its execution.
Strengths
DR
Covers all possible paths, which may have not been exercised using other methods.
All-paths testing is guaranteed to achieve statement coverage and branch coverage
coverage.
Weaknesses
Developing the CFG and identifying all the paths in complex code can be difficult
and time consuming.
When code contains loops, decisions must be made as to how to limit their
execution. This is necessary to limit the number of paths to a reasonable
number, but as a result weakens the testing.
All-paths does not explicitly evaluate the boolean conditions in each decision
(the individual boolean sub-expressions that make up a compund boolen
expression).
Does not explore faults related to incorrect data processing (e.g. bitwise
manipulation or arithmetic errors).
Does not explore non-code faults (for example, faults in a lookup table).
For CS265/CS608 Students Personal Use Only
FT
that no paths have been omitted. Exploratory programming and the debugger are often
used to try and find data values that will follow a particular exit from a node.
An experienced tester will often be able to develop the CFG without detailing all
the steps as shown for the example in this chapter. They will also probably be able
to identify the conditions required for each path without laboriously stepping through
the code. However, even for the experienced tester, developing all-paths tests is a time
consuming and difficult task.
A
DR
7 There are other forms of path testing, apart from end-to-end paths, which are less strong, but may
be easier to design tests for. However all paths testing is the strongest form of path testing: it is a
superset of all the other forms of path testing.
For CS265/CS608 Students Personal Use Only
Chapter 8
FT
Dynamic testing confirms the correct operation of a program, which is referred to as
the test item, by executing it. As shown in Figure 8.1, the test process can be modelled
as the comparison of the outputs of a real system with those of an ideal system. The
ideal system represents the specification. The real system represents the software being
tested.
A
DR
Test conditions are developed, according to the test design technique selected, that
guide the selection of suitable test input data. The test input data is then used as an
input to the ideal system in order to determine the correct output (referred to as the
expected results). The test input data is provided as inputs to the real system. The
real system executes and the actual results collected and compared with the expected
results. If they are identical, or equivalent, then the test result is a pass. Otherwise the
test result is a fail. As we have seen in the previous chapters, the test is usually executed
by an automated test tool.
The test data is derived from the set of test conditions, each of which has been chosen
to validate particular features or aspects of the system. The result of the test is a pass
or fail. However, for a test to be useful it does not have to produce a pass result. Even
a failed test gives us new knowledge about the system: it tells us for which inputs the
software is not working as specified.
140
For CS265/CS608 Students Personal Use Only
Key Point: black-box test cases and test data are derived solely from the
functional specifications.
FT
White-box testing (also referred to as Implementation-Based or Structural Testing)
uses the implementation of the software to derive tests. The tests are designed to exercise
a particular aspect of the program code, such as the statements or the decisions it
contains. By examining how the program works, and selecting test data to cause specific
components of the program to be executed, the tests can expose errors in the program
structure or logic.
White-box testing techniques are characterised by their Coverage Criteria. This refers
to the percentage of the components that have been exercised by the test. For example,
A
100% statement coverage means that all the lines of source code in a program have been
executed at least once during the test.
Key Point: white-box test cases are derived from the code; the test data
DR
Table 8.1 compares some of the key characteristics of black-box and white-box Testing.
For CS265/CS608 Students Personal Use Only
discussed in Section 1.6, exhaustive testing is seldom feasible, and so a subset of possible
inputs must be selected to cover key mappings between the input and output domains.
Measurement
It is difficult to automatically measure the degree of coverage of the specification that
black-box testing has achieved. Correct implementation of black-box tests therefore relies
heavily on the quality of the tester’s work.
FT
They all reflect the idea of testing the correct operation of the software using the structure
of the implementation as a source of test cases.
Regarding software as a set of components that create output values from input
values, the purpose of white-box testing is to ensure that executing the components (e.g.
statements, branches, etc.) always results in the correct output values (see Figure 8.3).
The specification is still needed to ensure that the output values are correct.
A
DR
This testing is ideally achieved by exercising all the components (including combi-
nations and sequences of components) in the implementation. As discussed already,
exhaustive testing is seldom feasible, and so a subset of possible inputs must be selected
to exercise key sets of components.
Measurement
Many programming languages and test environments provide tools that can measure what
was executed in the code during testing. They do this by recording which instructions
have been executed. This is referred to as instrumenting the code. Using these recordings,
tools can calculate simple coverage figures, such as percentage of lines executed, or
percentage of branches taken during the tests. We have seen some examples of this in
Chapters 5 and 6. Achieving 100% coverage of these simple criteria does not necessarily
indicate complete testing – there may be complex end-to-end paths as discussed in
For CS265/CS608 Students Personal Use Only
Chapter 7. However, achieving less than 100% coverage, clearly indicates that there
is untested code or untested branches.
These measurement tools are aften used to determine the coverage provided by black-
box testing. Following this, white-box testing can then be used to augment the black-box
testing and improve the test coverage.
It can be argued that setting any goal less than 100% coverage does not assure
quality. However, a lot of effort may be expended to achieve this. The same effort might
be better spent using a different test approach (for example: static testing instead of
dynamic testing). Using coverage-based white-box techniques, such as statement testing
or branch testing, a typical goal is to attain 80%-90% coverage or more before releasing1,2 .
However, these recommendations are not absolute and safety-critical software may
require more testing. One strategy that usually increases coverage quickly is to first
attain a base level coverage throughout the entire test program before striving for higher
coverage in critical code.
FT
Both black-box and white-box testing approaches have weaknesses:
• White-box testing does not typically find faults relating to missing functionality
(referred to as Errors of Omission): there is no implementation for these as a basis
to develop tests.
• Black-box testing does not typically find faults relating to extra functionality
A
(referred to as Errors of Commission): there is no specification for these as a
basis from which to develop tests.
Consider the example boolean isZero(int x), that should return true if x is zero, and
DR
1 Practical Software Metrics for Project Management and Process improvement (Grady)
2 Experience-Driven Process Improvement Boosts Software Quality (Vinter and Poulsen)
For CS265/CS608 Students Personal Use Only
This is a simple example, and it is easy to see that the code is not correct with respect
to the specification. In longer or more complex code, it is generally much more difficult
to identify these faults. In this case, what appears to be a single mistake by the developer
has lead to both types of error: changing the decision in line 3 to (x == 0) would fix both
faults. Often errors of omission lead to matching errors of commission, and vice-versa.
In general, error handling is often a source of omission faults in complex code, where
incorrect input data is not identified correctly. Misunderstanding the specification is
often a source of faults of commission, where extra input conditions are identified by the
programmer which were not in the specification.
Another way of looking at this is from the coverage viewpoint:
Black-box testing provides for coverage of the specification, but not full coverage of the
implementation. That is, there may be code in the implementation that incorrectly
produces results not stated in the specification.
White-box testing provides for coverage of the implementation, but not of the
specification. That is, there may be behaviour stated in the specification for which
there is no code in the implementation.
8.1.4 Usage
A FT
It is for these reasons that black-box testing is done first, to make sure the code is
complete. This is then augmented by white-box testing, to make sure the code does not
Black-box and white-box testing techniques are generally used in succession to maximise
coverage. The degree of coverage is usually based on the quality requirements of the
test item. For example, software that decides whether to sound an alarm for a hospital
patient will have higher quality requirements than software that recommends movies to
DR
watch. Based on these quality requirements, the tester can decide whether to stop testing
early, as discussed in section 1.8.
Black-box testing is used initially to verify that the software satisfies the specification:
• Use equivalence partitions to verify the basic operation of the software by ensuring
that one representative of each partition is executed (see Chapter 2).
• If the specification contains boundary values, use boundary value analysis to verify
correct operation at the edges (see Chapter 3).
• If the specification states different processing for different combinations of inputs,
use decision tables to verify correct behaviour for each combination (see Chapter 4).
These black-box test techniques can be further augmented:
• If the specification contains state-based behaviour, or different behaviour for
different sequences of inputs, then use state-based/sequential testing to verify this
behaviour (see Section 8.2.7 in this chapter).
• If there are reasons to suspect that there are faults in the code, perhaps based on
past experience, then use error guessing/expert opinion to try and expose them
(see Section 14.3.3 in Chapter 14).
• If the typical usage of the software is known, or to achieve a large number of tests,
then use random test data to verify the correct operation under these usage patterns
(see Chapter 12).
For CS265/CS608 Students Personal Use Only
We measure the statement and branch coverage for each of these black-box tests. If
the required coverage has not been achieved, we can add white-box techniques to further
enhance the quality of the testing:
• Use statement coverage to ensure that 100% of the statements have been executed.
Investigate the uncovered statements and identify test coverage items and test cases
specifically for these (see Chapter 5).
• Use branch coverage to ensure that 100% of the branches have been taken.
Investigate the branches that were missed and develop test coverage items and
test cases specifically for these (see Chapter 6).
• If the code contains complex end-to-end paths, then use all paths testing to ensure
coverage of these (see Chapter 7).
• If the code contains complex data usage patterns, then use du pair testing to ensure
FT
coverage of these (see Section 8.3.1 in this chapter).
• If the code contains complex decisions, then use CC/DCC/MCC/MCDC testing to
ensure coverage of these (see Sections 8.3.2, 8.3.4, 8.3.5, and 8.3.6 in this chapter).
In all cases, the decision to proceed with further tests is a judgement call and a cost-
benefit trade-off: balancing the extra time and work required to do the extra tests justified
against the extra confidence they will provide in the software quality. We discussed this
in detail in Chapter 1.
A
Some factors to consider when deciding what test techniques to use, and when to use
them in the development process:
• Black-box tests can be written before, or in parallel with, the code (as they are
DR
Error Cases
Generally there are a number of categories of error cases:
• A null reference.
• A string or array of length 0.
• A string or array which is too long by exceeding its specified maximum length (e.g.
13 digits for a modern ISBN number)4 .
• Invalid data. For example, a String may be specified to only include printable
characters; an integer array may be specified to contain only positive values.
• Invalid relationships. For example, a String may be specified to only contain
characters in alphabetical order; an integer array may be specified to be in
descending order.
Normal Cases
Normal cases are more difficult to categorise. Rather than selecting just one value from
FT
each equivalence partition, a number of different values may be selected.
For boundary values, both the length of the data structure and of the data it contains
should be considered.
And for combinations, a number of tests may be generated from each, again selecting
a number of values to represent each cause rather than just one.
The complexity of the data selected will depend on the complexity of the specification.
It may be useful to select one data set that includes the same value in each location in
A
the String or array, if this is allowed. For Strings representing contact information (such
as names, addresses, or phone numbers) one might use a telephone directory to discover
very short, very long, and a number of typical values. For arrays representing ints, such
as a list of numbers to be subject to statistical tests, one might select a few small data
DR
sets and one large data sets with known characteristics. And as always, it is important
to ensure that the output cases are all covered – these often provide additional guidance
on selecting input values.
4 For example, the maximum array length in Java is 231 , but whether this can be allocated depends
The best way to handle discontinuous ranges for input parameters is to separate out
the contiguous ranges of values. So, in this example, x should be treated as having the
following three equivalence partitions:
x.EP 1 Integer.MIN VALUE..-1
x.EP 2 0
x.EP 3 1..Integer.MAX VALUE
And the associated boundary values:
x.BVA 1 Integer.MIN VALUE
x.BVA 2 -1
x.BVA 3 0
x.BVA 4 1
x.BVA 5 Integer.MAX VALUE
FT
The reasoning behind this is that the values -1 and +1 are special: they are immediate
predecessors and successors to the value 0 which is treated differently. The software
must correctly identify these as boundary values. They are values which are likely to be
associated with faults in the software.
amount calculated at 1% (or x/100). The return value is one of the following:
• -1 indicating an error, if amount < 0.
• 56, if fixed is true (fixed tax).
• x/100, which is 1% of the amount if fixed is false (variable tax).
Considering the non-error case, the variable tax can therefore have a value from 0
to Integer.MAX VALUE/100. And the fixed tax can only have a single value, 56. This
gives the return value two partitions, with overlapping values:
1. A partition containing all the values from 0 to Integer.MAX VALUE/100.
2. The value 56, which is in a single-valued partition of its own.
The value lines for these partitions are shown in Figure 8.5. The overlap is indicated
by the dashed lines in Partition 1.
Partition 2 56
There is a third, error partition not shown in the figure, with the single value -1.
The values from Integer.MIN VALUE to -2, and above Integer.MAX VALUE/100 are
not possible return values, so need not be considered – there are no inputs which should
cause any of these values to be returned.
The best way to handle these overlapping ranges for outputs is to ignore the overlap,
and treat each range of values separately. So, in this example, x should be treated as
having the equivalence partitions:
This leads to two non-error cases, and one error case, for the return value. Test
data for the input error case (low<high) will also cover the output error case (return
value==false). Test data for the input non-error cases will also cover the output non-
error cases (return value is true) and (return value is false).
A similar approach would be taken for integers, or other in-band error reporting
mechanisms.
As part of Design for Testability, improved error handling can be provided by out-
of-band reporting. In Java, this might be achieved by throwing an exception. However,
a tester must be prepared to test code which uses either approach.
partition.
The next partition has the single value y. When we move to y+1 then the processing
changes, as now x is greater than y.
Finally, working along the value line from y+1 we reach Integer.MAX VALUE with
the same processing.
The value lines for the partitions for input x are shown in Figure 8.6 – these are
relative partitions, as they are relative to the value of y.
There are three corresponding cases for y, where we treat x as though it had a fixed
value:
For CS265/CS608 Students Personal Use Only
1. y < x
2. y == x
3. y > x
• Integer.MIN VALUE..y-1
• y
• y+1..Integer.MAX VALUE
• Integer.MIN VALUE..x-1
• x
• x+1..Integer.MAX VALUE
• x
• x+1
• Integer.MAX VALUE
These partitions are subject to the interpretation discussed before for empty partitions
and single-value partitions.
Using the principles of overlapping partitions we have considered, this gives the
following partitions for x:
x.EP 2 1..y-1
x.EP 3 y
x.EP 4 y+1..y+z-1
x.EP 5 1..z-1
x.EP 6 z
x.EP 7 z+1..y+z-1
A FT
x.EP 8 y+z..Integer.MAX VALUE
Many software systems are state based – their behaviour is based not only on the current
inputs, but also on their current state (essentially defined by the history of previous
inputs). When the system reaches a particular state, this is based on the sequence of
inputs to date.
A simple example would be software for a touch-controlled lamp. When it is off, a
touch toggles the lamp to on; when it is on, a touch toggles the lamp to off. The system
has two states ON and OFF. Which state the system is in is based on how many times
it has been toggled. And the behaviour of a toggle depends on which state the system is
in.
The fault model of state-based testing is to find faults in the software associated
with these different states. Typically this will mean that the software fails to reach a
particular state correctly. Or that when the software is in that state, it fails to function
correctly.
State behaviour can be defined using a State Diagram6 . This shows all the states,
the transitions between the states, the input events that cause these transitions, and
the actions to be taken on each state transition. Figure 8.8 shows the state diagram for
the touch-controlled lamp. The boxes represent states. The arrows represent transitions
between the states. The initial arrow to the OFF state indicates that the system starts
6A State Table may also be used, providing the same information but in tabular form.
For CS265/CS608 Students Personal Use Only
in that state. And the label toggle() indicates an event, or method call, that should cause
the transition to take place7 . For simplicity, no actions are shown here.
State-based testing raises the events that should cause the software to transition
between different states by calling the methods on the state diagram. The tests can
check that the correct state transitions have taken place, and that the correct actions
have occurred.
•
•
Piecewise (test some transitions)
FT
Well established test design strategies for selecting test coverage criteria for state-
based testing are:
•
All Transitions (test every transition at least once).
All Round-trip Paths (test all paths starting and ending at the same state).
A
• M-length Signatures (often it is impossible to directly access the state in order to
verify that a transition has taken place correctly. In this case sequences of events
which produce a unique set of outputs are used to verify that the software is in
the expected state. These are called signatures. The outputs will be return values
DR
either from the methods which cause state transitions, or from other methods, not
on the state diagram, which do not. This strategy tests all signatures of up to M
transitions, for some value of M. The disadvantage of using signatures is that they
change the state that the software is in, and the test must take this into account).
• All n-event transition sequences (test all sequences of transitions for a particular
value of n).
• Exhaustive Testing (testing all possible sequences of transitions).
detect that the lamp has been touched, and software would then call the toggle() method.
For CS265/CS608 Students Personal Use Only
FT
representation – this is also supported by most modern hardware. Unlike integers, there
is a sign bit, and also the values do not wrap around. Floating point is best regarded
as an approximation: if a small value is added to a large value it may have no impact,
and the difference between two large values may be returned as 0.0 even if they are not
equal. Many fractions and decimals (such as 1/3, or 0.1) cannot be represented exactly
in floating point). For example, the Java statement System.out.printf(”%19.17f”,0.1f)
produces the output 0.10000000149011612 when executed, and not 0.10000000000000000
as you might expect!
A
The following list discusses a few problems that the software tester may experience
with floating point, and suggests remedies for these.
• Handling Cumulative Errors. Because not every number can be exactly represented,
and values can get lost when subtracting or adding a very small number from a
very large one, errors tend to accumulate in software that uses floating point. In
order to handle this, the maximum allowable error needs to be specified for such
software, and then as for comparing to a constant, this provides the upper and
lower bounds on a correct output.
For CS265/CS608 Students Personal Use Only
The software tester using floating point is advised to read one of the standard works
on writing code that uses floating point.
more analysis.
FT
The examples in the book focus on logic processing, which do not produce a calculated
result. This section considers numeric processing, which returns the result of a numeric
calculation. For equivalence partition and boundary value analysis testing, the input
partitions should be handled the same way, but the output partitions may require a bit
Consider the method int add10(int x) which adds 10 to the input value if it is between
0 and 90, and raises an IllegalArgumentException otherwise.
The input partitions for x are easy to identify as shown in Figure 8.9.
A
Integer.MIN VALUE -1 0 90 91 Integer.MAX VALUE
DR
The output partitions for the return value of add10() are derived using the same logic
as used in Chapter 2, but they require some calculations on the behalf of the tester, as
shown in Figure 8.10.
The return values may produce overlapping partitions, as discussed in Section 8.2.3.
Floating point return values can also require significant analysis – see Section 8.2.8. In
some cases it may not be possible to identify all of the output partition boundaries.
FT
tools to measure their coverage. But, as with other white-box techniques, the tests
are invalidated every time a change is made, and so they are generally reserved for use
on software which has passed all its previous tests, and which has a requirement for a
particularly high level of quality (such as in the aerospace or medical devices industries).
correctly.
Every possible du pair is a test coverage item. Some du pairs will be impossible
to execute and do not form test coverage items. Test cases are then developed. This
requires the tester to review the code carefully and select input parameter values that
cause du paths to be followed. As for other forms of white-box testing, each test case is
likely to cover a number of test coverage items. The expected results are derived from
the specification. It is easy to make the mistake of reading the expected results directly
from the code.
DU pair testing provides comprehensive testing of the dataflow in a program, but
generating the test data is a time consuming exercise.
Strengths
This is a strong form of testing.
It generates test data in the pattern that data is manipulated in the program
rather than following abstract branches.
Weaknesses
N
P
The number of test cases can be very large, up to di ·ui , where di is the number
i=1
of definitions for variable i, ui is the number of uses, and N is the number of
variables (this includes arguments, local variables, and class attributes).
For CS265/CS608 Students Personal Use Only
FT
condition coverage is that some boolean conditions are not handled correctly.
The goal of condition coverage is for every boolean condition in every decision to take
on the values true and false. Even though in practice some languages may not in fact
evaluate later conditions if earlier ones determine the value of the decision (short circuit
evaluation), this is ignored in condition coverage. Each condition in each decision in the
source code has two test coverage items – for true and false outcomes.
Test cases are developed, with input data selected to ensure that every boolean
A
condition in every decision takes on the value true and false. This selection requires the
tester to review the complex decisions and the boolean conditions in the code, and select
input parameter values that cause the necessary outcomes from each boolean condition.
The expected results come from the specification.
DR
Comment
Some languages, such as Java, support left-to-right evaluation and short-circuit evalua-
tion. This means that subsequent conditions in a decision may not actually be evaluated,
depending on the results of previous ones.
For example, in the decision (a || (b && c)):
Strengths
Focuses on boolean condition outcomes.
Weaknesses
Can be difficult to determine the required input parameter values.
Does not always achieve decision coverage.
For CS265/CS608 Students Personal Use Only
FT
the boolean conditions in the code, and select input parameter values that cause the
necessary outcomes from each decision and boolean condition. Even though in practice
some languages may not in fact evaluate later boolean conditions if earlier ones determine
the value of the decision, this is ignored.
Strengths
Stronger coverage that just Condition Coverage or Decision Coverage.
A
Weaknesses
Even though every decision is tested, and every boolean condition is tested, not
every possible combination of boolean conditions is tested.
Can be difficult to determine the required input parameter values.
DR
Comment
Not all combinations of boolean conditions are always possible. Even though multiple
condition testing covers every possible combination of boolean conditions in each decision,
it does not cause every possible combination of decisions in the program to be taken.
Strengths
Tests all possible combinations ofboolean conditions in every decision.
Weaknesses
Can be expensive: n boolean conditions in a decision give 2n test coverage items.
Can be difficult to determine the required input parameter values.
FT
of tests. This can be reduced by only considering those combinations of boolean
conditions that cause a discernible effect on the output of the software. The test
conditions for Modified Condition/Decision Coverage8 (often abbreviated to MC/DC) are
based on decision condition coverage, with additional conditions to verify the independent
effect of each boolean condition on the output.
Each decision has two test coverage items (for true and false outcomes) and every
boolean condition in each decision has two test coverage items (for true and false
outcomes). In addition, test coverage items must be created that show the effect on
A
the output of changing each of the boolean conditions independently.
Consider the method func(a,b) shown in Snippet 8.2. Test cases must be created to
show the independent effect of changing the value of the boolean conditions (a>10) and
of (b==0) on the returned value.
DR
The tester must review the complex decisions and boolean conditions in the code,
and select input parameter values for the test data to ensure:
• That every decision takes on the value true and false.
• That every boolean condition in every decision takes on the value true and false.
• That the effect on the output value of changing the value of every boolean conditions
is shown.
For example, using the method func(int a, int b) as defined above:
• The input data (a=50,b=1) should result in the output value: 2.
8A Practical Tutorial on Modified Condition/Decision Coverage (Hayhurst)
For CS265/CS608 Students Personal Use Only
• Adding the input data (a=5,b=1) should result in the output value 100. This shows
the independent effect of changing the boolean condition (a>10).
• And adding the input data (a=5,b=0) should result in the output value 20. This
shows the independent effect of changing the boolean condition (b==0).
Strengths
Stronger coverage that Condition Decision Coverage, but without the large
number of tests produced by Multiple Condition Coverage. MCC will produce
2n tests, for n boolean conditions. MCDC will produce in the order of n tests
(with a minimum of n + 1).
This technique has been reported as being very successful in code where complex
decisions are made (for example the aerospace industry), and in particular
where decisions with very large numbers of boolean conditions are used.
Event-based software is likely to require complex decisions, and thus the technique
may be of applicability to GUI and web-based software.
Weaknesses
8.3.7
Not as thorough as MCC.
Test Ranking
FT
Not all testing techniques can be directly compared to each other in terms of their
effectiveness, but a number of the white-box techniques can.
The standard comparison ranks the techniques as shown in Figure 8.11. The
A
techniques labelled nearer the top of the figure are described as being stronger than
those lower down. This means that they guarantee to provide at least the same level
of coverage as the techniques lower down. However, the stronger techniques invariably
take more time and effort to implement. The stronger techniques are said to subsume
DR
the weaker ones, as they provide at least the same level of coverage. The stronger arrow
represents the subsumes relationship.
In order to broaden the value of finding a fault, it is useful to abstract the fault as far as
possible. Then additional black-box tests can be developed that attempt to find other
places in the code where the same mistake may have been made by the programmer,
leading to similar faults. If the fault has a characteristic signature (often caused by
DR
8.4.4 Example
Consider a program that at some point has to search a list of data for a particular entry.
The data is kept in a linked-list, and due to a mistake by the programmer the code never
finds penultimate entries in the list (i.e. entries which are second-from-last). This fault
eventually causes a failure, which is reported by a customer. The fault is then located
by replicating the customer’s conditions and debugging the code. Subsequently the fault
is repaired by rewriting a line of code in the method that handles traversing the list.
To verify the repair, a white-box unit test is developed that exercises the new line of
code. This verifies that the method now produces the correct result for the data that
previously caused a failure. A black-box unit test is then developed that ensures that,
for different sets of data, the second last entry can be found in this list. A system test
may be developed to verify that the entire system now behaves correctly in situations
when the second last entry is important.
Finally, additional unit tests or system tests are developed that check that, for any
other lists or collections of data in the program, the second-to-last entry is located
correctly.
For CS265/CS608 Students Personal Use Only
A FT
DR
For CS265/CS608 Students Personal Use Only
Chapter 9
Testing Object-Oriented
Software
FT
The black-box and white-box unit test techniques described in Chapters 2 to 7 have
been applied to a single static method. This allows key concepts of the test design
techniques to be introduced without the additional complexities introduced by classes
and objects. The same techniques can be applied to testing the instance methods1 of a
class – this is unit testing where the unit is a class.
A
9.1 Testing in Class Context
In general, methods interact with each other and cannot be tested independently: they
need to be tested in the context of their class hierarchy. Methods interact with other
DR
methods via class attributes. Testing in class context refers to testing methods not
individually, but in the wider context of their class.
Typically, setter methods must be called first to initialise any attributes used by
the method being tested. Then the method itself is called, passing required inputs as
parameters. The method return value must be verified. And then any changes the
method has made to the attributes must be verified by calling getter methods. Thus the
interaction of a number of methods must be tested, not the operation of a single method.
Whether setters and getters is good object-oriented (OO) design is an open question,
but the software tester needs to know how to access the attributes.
In this chapter, the essential topic of designing tests for class context is addressed.
The test data may be derived using any of the previously covered black-box or white-box
techniques. The topic is introduced through an example, followed up by a more detailed
description of OO testing in Section 9.4.
9.2 Example
The class SpaceOrder supports processing orders to book space in a warehouse. The key
method is acceptOrder() which decides whether an order for space can be accepted or
not. In general, all orders must fall within a specified minimum and maximum amount
1 Methods which are not static.
163
For CS265/CS608 Students Personal Use Only
of space. However, for special customers, orders for space less than the minimum will
also be accepted.
We demonstrate the technique using equivalence partition testing. Tests for other
black-box and white-box techniques would be developed in exactly the same way, but
then the specific data values produced by these other black-box and white-box techniques
would be used instead.
A UML2 Class diagram is used to define the attributes and methods in the class (see
Figure 9.1).
SpaceOrder
special:bool
accept:bool=false
A FT
constructor+SpaceOrder(bool)
+getSpecial(): bool
+getAccept(): bool
+acceptOrder(int): bool
It is assumed that the reader is familiar with the UML notation, but as a reminder
the class diagram is interpreted as follows:
DR
2
3 protected boolean special;
4 protected boolean accept=false;
5
6 public SpaceOrder(boolean isSpecial) {
7 special = isSpecial;
8 }
9
10 public boolean getSpecial() {
11 return this.special;
12 }
13
14 public boolean acceptOrder(int space) {
15 boolean status=true;
16 this.accept = false;
17 if (space<=0)
18 status=false;
19
20
21
22
23
24
25
26
}
this.accept = true;
return status;
}
return this.accept;
A FT
else if (space<=1024 && (space>=16 || this.special))
27
28 }
FT
If invalid input data (value of space), set accept false.
An order can be accepted as follows. For all orders, the space must not be
greater than the maximum space of 1024 m2 . For a standard order, the space
must be at least the minimum space of 16m2 (a special order has no lower
limit).
Parameters:
space – space in m2 requested in the order (must be >0)
A
Returns:
true if valid input data, and the attribute accept has been set (to either
true or false). Otherwise false
DR
• boolean example.SpaceOrder.getAccept()
Returns:
the value of accept (whether the SpaceOrder has been accepted or not)
9.2.1 Analysis
During unit testing in the previous chapters, there was a single static method to test (the
method giveDiscount() – see Chapters 2 to 7). In contrast, a class may contain many
methods, and a decision must be made which should be tested.
2. Constructors – include this constructor in the test to ensure that the attributes
are correctly initialised:
• SpaceOrder(boolean)
3. Accessor methods (getters and setters) – the general rule is to only test these
if they are manually written or contain more than a single assignment (setter) or
return statement (getter). However, in this example the class is small and the
getters and setters can easily be verified through a code review3 . Therefore we will
not test these:
• getAccept()
• getSpecial()
4. Methods with no class interaction (do not read or write any of the class
attributes) – none (if present, these can be tested individually using non-OO
approaches as shown in Chapters 2 to 7).
9.2.3
• acceptOrder(int)
We must now decide which test technique to use to select test data. For simplicity, we will
demonstrate equivalence partition testing in this example. Exactly the same approach
would be used for boundary value analysis and decision table testing. And the approach
is also similar for statement coverage and branch coverage testing.
DR
Only the results of the analysis are shown, not all the intermediate results. Refer
to Chapter 2 for details of deriving the natural ranges, value lines, and equivalence
partitions.
3 There is a small risk associated with this approach, as even a single assignment or return statement
Accessor Methods
We have decided not to test the getter and setter methods. However some of the
attributes can only be accessed via their setters and getters, so it is important to identify
these for each attribute. Table 9.2 lists the getter and setter methods.
The method acceptOrder() reads and writes to attributes (special and accept) but is
not regarded as a pure getter or a setter method, as it does other processing.
Other Methods
FT
There are no methods with no class interaction, and there is a single method with class
interaction. Table 9.3 details this single method, listing the attributes read and written.
Equivalence Partitions
Using the technique shown in Chapter 2, equivalence partition tests are developed for
the inputs and outputs for each method. The full working of each step will not be shown,
just the results of each – refer to Chapter 2 for details.
Both the inputs to, and the outputs from, a method may be explicit or implicit:
• Explicit inputs are parameters passed to the method.
• Implicit inputs are attributes read by the method.
• An explicit output is the return value from a method, or an exception raised.
• Implicit outputs are attributes written by the method.
In order to achieve full equivalence partition coverage, all of these must be included.
The only parameter with interesting value lines (that require analysis) is the input
parameter space passed to acceptOrder(), as shown in Figure 9.3. All the other
parameters are boolean and do not require further analysis to identify their equivalence
partitions.
The equivalence partitionsfor the inputs and outputs of all the methods being tested
are identified in Table 9.4.
accept
return value
FT true
false
true
false
true
false
As in previous chapters, the asterisk (*) indicates an input error partition. Where
there is only a single method with a particular name, then just the method name can be
used4 as shown here.
If an attribute is both an input and an output for a method, then the input and
DR
output equivalence partitions must be separately listed – however, this is not needed
here.
4 If multiple methods have the same name, then the full method signatures, including the parameter
To Be Completed Later
EP4 false
EP5 accept false
EP6* acceptOrder() space Integer.MIN VALUE..0
EP7 1..15
EP8 16..1024
EP9 1025..Integer.MAX VALUE
EP10 special true
EP11 false
EP12 accept true
EP13
EP14
EP15
FT false
true
false
The values selected for each equivalence partition are shown in Table 9.6.
A
Table 9.6: Selected OO EP Data Values
DR
The test cases can now be developed. The significant difference in developing OO test
cases is that, unlike in Chapters 2-7 where a single method is called, multiple methods
must be called for OO testing – and they must be called in the correct order for the test
to work5 .
In the test cases table (Table 9.7), the required sequence of method calls is shown for
test case T1. Each call includes the test data consisting of the input parameters and the
expected return values. It is normal practice to develop test cases for the constructor(s)
first, then for any accessor methods being tested, and finally for the other methods.
Showing the test coverage items opposite the relevant method call makes reviewing the
test cases easier.
5 Not all books show this level of detail in designing OO tests, but we recommend this as good
practice. Otherwise, when implementing the tests, the tester has to redo the analysis to work out the
methods and the ordering required.
For CS265/CS608 Students Personal Use Only
The other test cases are now developed in a similar manner. The full set of test cases is
shown in Table 9.8. As in Chapters 2-7, it is best to approach this systematically, trying
to cover the test coverage items in order, while trying to avoid unnecessary duplication,
A
and covering the error test coverage items last.
Counting the maximum number of test coverage items for each method being tested
gives an indication of the minimum number of test cases expected: the constructor has a
maximum of two (for isSpecial), and acceptOrder() has a maximum of four (for space),
so a minimum of six test cases is expected.
TCI
EP1
EP2
EP3
EP4
Method
SpaceOrder()
A Name
isSpecial
special
FT
Table 9.9: Completed Test Coverage Items Table for SpaceOrder
Equivalence Partition
true
false
true
false
Test Case
T1
T2
T1
T2
EP5 accept false T1
EP6* acceptOrder() space Integer.MIN VALUE..0 T6
EP7 1..15 T3
EP8 16..1024 T4
DR
14
15
16
}
A FT
@Test(dataProvider="constructorData")
public void testConstructor(String tid, boolean isSpecial,
boolean expectedSpecial, boolean expectedAccept) {
SpaceOrder o = new SpaceOrder(isSpecial);
assertEquals( o.getSpecial(), expectedSpecial );
assertEquals( o.getAccept(), expectedAccept );
17 }
18
19 @DataProvider(name="acceptOrderData")
20 public Object[][] getAcceptOrderData() {
21 return new Object[][] {
DR
test cases T1 and T2. The data is provided by the data provider named
constructorData defined on lines 3-10.
• The method testAcceptOrder(), on lines 30-36, uses parameterised data to
implement test cases T3 to T6. The data is provided by the data provider named
acceptOrderData defined on lines 19-28. Making this test method depend on the
method testConstructor forces the constructor tests to run first6 .
• The test code only makes a single call to the method under test, matching the test
cases. This makes reviewing the test code easier, and also makes debugging easier
if a test fails.
FT
PASSED: testConstructor("SpaceOrderTest T1", true, true, false)
PASSED: testConstructor("SpaceOrderTest T2", false, false, false)
PASSED: testAcceptOrder("SpaceOrderTest T3", true, 7, true, true)
PASSED: testAcceptOrder("SpaceOrderTest T4", false, 504, true, true)
PASSED: testAcceptOrder("SpaceOrderTest T5", false, 5000, true, false)
PASSED: testAcceptOrder("SpaceOrderTest T6", false, -5000, false, false)
===============================================
Command line suite
Total tests run: 6, Passes: 6, Failures: 0, Skips: 0
A
===============================================
A FT
Figure 9.5: Object-Oriented Execution Model
Figure 9.5 summarises the environment in which methods execute. The inputs
labelled are the explicit inputs or parameters passed in the method call, and the outputs
are the explicit outputs or values returned by the method call. Public methods are placed
on the class boundary: they are accessible from outside the class. Private methods and
DR
private attributes are hidden inside the class8 – they are not accessible from outside
the class, and cannot be accessed or used by the tester. The important features of the
diagram are:
• Methods which have a single function, which is to set the value of an attribute, are
referred to as setters – marked with an S.
• Methods whose single function is to get an attribute value, are referred to as getters
– marked with a G.
• For test purposes, methods which do any processing are not regarded as getters or
setters, and are unlabelled in the diagram. These other methods may be passed
input parameters, may read attributes as inputs, do processing, call other methods
both public and private, write attributes as outputs, return a value as an output,
or raise an exception as an output. Public methods may be called by the tester.
Private methods may only be called from within the class.
• Classes may have relationships with other classes: these may be inheritance
relationships, where one class reuses all the code of another class (the is-a
relationship, also referred to as generalisation/specialisation); and aggregation
relationships, where one class contains a collection of other classes (the has-a
relationship).
8 It is unusual to use public attributes in Java.
For CS265/CS608 Students Personal Use Only
FT
are called is important for correct operation. Generally, the code which ensures
this ordering is distributed over many methods. This provides many opportunities
for what are referred to as state control faults, where an object does not behave
correctly in a particular state (see Section 8.2.7).
Inheritance Inheritance leads to a binding between objects and different classes and
interfaces10 . Dynamic binding occurs at runtime, where a decision is made by the
Java VM as to which overriden method to use. Complex inheritance structures, or
A
a poor understanding of inheritance in Java, may provide many opportunities for
faults due to unanticipated bindings or misinterpretation of correct usage.
Messages Programming mistakes when making calls, where the wrong parameters are
DR
passed, or the correct parameters are passed but mixed up, are a leading cause
of faults in procedural languages. Object-oriented programs typically have many
short methods, and therefore provide an increased risk of these interface faults11 .
9 Testing Object Oriented Systems (Binder) is probably the most thorough reference book.
10 This refers to a Java Interface, or an abstract class in more general terms.
11 Interface here means an API, or the definition of the parameters to a method call, and not a Java
Interface.
For CS265/CS608 Students Personal Use Only
FT
To test software in an object-oriented environment, an object (an instance of the class
being tested) must be created first. When a method is called, some or all of the inputs
and outputs for the method may be attributes rather than explicitly passed parameters
and return values. In this example, other methods need to be called beforehand to set
some of the input values (setters), and afterwards to fetch some of the output values
(getters).
The structure of a typical object-oriented unit test is shown in Snippet 9.3. This
A
test structure is referred to as testing in class context. Note how this contrasts with a
conventional test as shown in Snippet 9.1.
The variables p1, p2, p3 are the same as the parameters passed directly to the method
in the conventional test. And the actual results are the same as in the conventional test.
But the operation of this test is more complex, and has interesting subtleties:
• On line 1, an object is created by calling the constructor with the value p1. The
value is stored in a private attribute paramP1 (not shown in the snippet).
• On line 2, the value p2 is set. Again, this value is stored internally, in attribute
paramP2.
• On line 3, the method being tested is called with the value p3. The method
method under test() uses the attributes paramP1 and paramP2 as inputs, along
with the parameter p3. It returns a boolean value to indicate whether the method
succeeded. The result of the method is stored in a private attribute, theResult.
The returned status is stored in the variable actual status.
For CS265/CS608 Students Personal Use Only
• On line 4, the actual status is compared with the expected status. If they are the
same, then the test continues. Otherwise the test fails.
• On line 5, the attribute theResult is retrieved using the getter method get result()
– this are the actual results.
• On line 6, the actual results are compared with the expected results. If they are
the same, then the test passes. Otherwise the test fails.
This demonstrates the various additional mechanisms that object-oriented test code
uses for inputs and outputs in class-context. All the conventional black-box and
white-box techniques described in this book can all be applied to testing methods in
class context: equivalence partitions, boundary values, decision tables, random tests,
statement coverage, branch coverage, and all paths coverage. The test conditions will be
essentially the same; the key differences being in the analysis, the test cases, and the test
implementations which all need to take the class context issues into account.
1. Static methods – these are not dependent on the constructor. Usually they are
independent, but if they interact through static attributes, then they should be
considered in class context along with the other methods.
2. The constructor(s) – including the default constructor. Note that the constructor
is called after the other object initialisation has taken place.
It is recommended to limit each test case to one primary method call being tested:
though this may require a number of supporting method calls as we have seen. This
makes reviewing the test cases for correct ordering and completeness much easier12 .
It is not always necessary to use separate tests for each input error case (unlike in
equivalence partition testing), as long as the input error values are inputs to different
methods and do not cause error hiding.
in Chapter 11. State-based testing verifies the behaviour of sequences of inputs, usually
specified using the UML state diagram.
See Chapter 14 for further reading on testing OO software.
12 When the test code is being implemented, multiple test cases can be condensed into a single test
• If a subclass is not fully substitutable, then only a subset (and maybe none) of
the superclass tests can be reused in the subclass inheritance test. Analysis must
be performed to select the applicable tests. This analysis is made more difficult
as the standard UML Class Diagram does not specify whether a subclass is fully
substitutable or not.
Once the tests to be run have been selected, inheritance testing is primarily a test
automation question – see Section 11.9 for an example.
FT
This form of testing specifically addresses the class encapsulation/state fault model. The
purpose of state-based testing is to verify that a class behaves correctly with regards to
its state specification (e.g. UML State Machine Diagram). A state diagram contains
states and transitions between those states. State-based testing verifies that the software
transitions correctly between the states. There are three simple test strategies:
• All transitions – every transition is verified at least once.
A
• All end-to-end paths – every path from the start state to the end state is verified
at least once. If there is no end state, which is common in software state diagrams,
then every paths from the state to every terminal state14 can be used.
• All circuits – every path that starts and ends in the same state is verified.
DR
• Checking that the software is in the correct start state (this may have already been
done by the verification of the previous transition).
• Raising the event (i.e. call the method specified with the correct parameter values).
• Checking that the specified activity has taken place correctly.
• Checking that the software is in the correct end state.
It is not always possible to fully check that an object is in the correct state, or that
the correct activity has taken place15 . In this case a partial check can be done.
A state diagram for SpaceOrder is shown in Figure 9.6.
There are two states and three transitions explicitly specified in the diagram. There
are also four transitions which are implied by the diagram, but not explicitly stated.
These implicitly specified transitions are method calls not shown on the state diagram –
they should cause a transition back to the same state, and have no associated action. In
this example, the effect of calling getSpecial() and getAccept() in each of the two states
in not shown, and so the implicit requirement is that they cause no effect. The full set of
A
Number Start State
FT
transitions to be tested, with added numbering for later reference, is shown in Table 9.10.
The importance of the state diagram is that calling getAccept() in the UNREADY
state does not produce a valid result, as acceptOrder() has not been called.
Using the all-transitions strategy results in seven test coverage items: one for each
transition.
When verifying that the state transitions have taken place correctly:
• You cannot fully verify that the software is in the Unready state: the best partial
check is that getAccept() returns false, and getSpecial() returns the provided value
of isSpecial.
• You can verify that the software is in the Ready state if acceptOrder() has set
accept to true by calling getAccept(), which should return true, and getSpecial()
which should return the provided value of isSpecial.
• You cannot verify that the software is in the Ready state if acceptOrder() has set
accept to false.
Therefore, the transition from Unready to Ready should be checked by setting values
for special and space that should set accept to true. This allows both the Unready
and Ready states to be uniquely identified (if the software is working correctly). The
For CS265/CS608 Students Personal Use Only
transition from Ready to Ready by calling acceptOrder() can only be (partially) verified
by using a value for space that causes accept to be set to false.
An example of the test code for state-based testing of SpaceOrder is shown in
Listing 9.3. Note that the checks to verify the software is in the correct state often
cause transitions themselves, which must also be checked for in turn. This makes writing
state-based tests quite challenging.
assertFalse(o.getAccept());
// transition 4
o.acceptOrder(1000);
// check activity and state for t4
assertFalse(o.getSpecial());
FT
// check activity and state for t2 and t3
assertFalse(o.getSpecial());
17 assertTrue(o.getAccept());
A
18 // check activity and state for t5 and t6
19 assertFalse(o.getSpecial());
20 assertTrue(o.getAccept());
21 // transition 7
DR
22 o.acceptOrder(2000);
23 // check activity and state for t7
24 assertFalse(o.getSpecial());
25 assertFalse(o.getAccept());
26 }
27
28 }
The order in which transitions are tested is important. It would be possible to put
each transition test into a separate test method and use dependencies to force them to
execute in the correct order (this is discussed in Section 11.9.2). The other approach,
shown in Listing 9.3, is to test all the transitions in a single method. This is simpler to
implement, but has the disadvantage that it is more difficult to debug a failed test, as
the test covers multiple transitions.
The results of running these tests against SpaceOrder are shown in Figure 9.7.
For CS265/CS608 Students Personal Use Only
PASSED: allTransitionsTest
===============================================
Command line suite
Total tests run: 1, Passes: 1, Failures: 0, Skips: 0
===============================================
All the tests have passed, showing as far as is possible that that each transition in
the state diagram works. The test effectiveness is limited by being unable to access the
object state. A tester can only test what is possible.
Each item on each diagram has a meaning, and therefore represents a testable
property of the software system. For example, the relationships between classes and
any associated multiplicities are useful sources of tests in the Class Diagram; and the
interaction between classes and methods are useful sources of tests in the Sequence
A
Diagram, Activity Diagram, and Interaction Overview Diagram.
Encapsulation can make testing difficult, especially when trying to access the class
attributes in order to verify the actual results match the expected results. There are
a number of solutions to this, such as using Java reflection to access private attributes
at run-time17 . Another solution, which is available in many languages, is the use of
assertions for built-in testing or BIT. The tests are referred to as built-in as the test
assertions are built into the code, rather than being located in an external test class.
This can be very effective for ensuring that assumptions that the programmer has
made are in fact true when required in the code. They can also be effective in verifying
that class invariants are maintained by every method, by asserting them at the end of
every method18 . In Java, assertions can be turned on at runtime, so that during testing
they are enabled, and in deployment they are disabled.
However, they are less effective in replacing the usual unit tests, as often they need
to refer to not only the current value of the method variables (attributes, parameters,
and local variables) but also the original values just when the method started execution.
Having the code keep copies of these values for testing can be very inefficient in terms
16 There are both official and unofficial diagrams – see https://www.uml-diagrams.org for details.
17 Java Reflection can be used to examine classes at runtime – see the The Reflection API tutorial at
https://docs.oracle.com/javase/tutorial/reflect/index.html for further information.
18 Class invariants are constraints on the class attributes that must always hold – they are often used
for safety-critical software as discussed in Section 14.3.9. For example, in a tunnel management system,
the number of trains in the tunnel might be constrained to always be less than 2.
For CS265/CS608 Students Personal Use Only
of memory space and execution time. It also increases the effort required to code each
method, and increases the chances of making a mistake.
An example of built-in test is shown in Listing 9.4, where an extra attribute
acceptedSpace has been added to the class.
}
return special;
FT
public boolean acceptOrder(int space) {
assert acceptedSpace >=0 && acceptedSpace <=1024;
boolean status=true;
19 acceptedSpace = 0;
A
20 accept = false;
21 if (space<=0) {
22 status=false;
23 }
DR
It is straightforward to verify the safety conditions via the Java assert statement.
These are placed at the end of the constructor on line 9, and at the start and end of
acceptOrder() on lines 17 and 29. To avoid the code duplication, a method to check the
safety conditions could be implemented, and called on these lines instead.
However, an assertion to verify that the method acceptOrder() has worked correctly
cannot always be so easily implemented. Line 29 checks that acceptedSpace is valid, but
it does not check that either the attribute accept or the return value is correct. This
would require access to the values of the special attribute and the space parameter when
For CS265/CS608 Students Personal Use Only
the method was entered – but these may have been changed during the method. In
this case they have not been modified, but a different algorithm or a fault in the code
could have modified either of these. As the values at the end of the method cannot be
relied on to represent the values at the start of the method, copies need to be made at
the start of the method. In this simple code copies could be easily kept, but this would
introduce the possibility of further faults. And also, in general, complete copies of any
referenced objects would be required: this is a non-trivial problem and often unrealistic
to implement19 . For example, if an array of Counters is passed as an input, then a copy
of the entire array would need to be made, including copies of every Counter in the array.
The results of running the EP tests against SpaceOrder with built-in tests are shown
in Figure 9.8.
All the tests pass – not only are the correct values returned and verified in the tests in
SpaceOrderTest, but also all the assertions checked in the built-in tests have succeeded.
DR
9.5 Evaluation
The tests developed for conventional testing in class context are evaluated in this section
by introducing a number of different faults and examining the test results.
9.5.1 Limitations
Three types of fault are demonstrated:
• A simple typo fault which you would expect to find with equivalence partition
testing.
• A more complex, state-based fault.
• A more complex, inheritance-based fault (in a new subclass).
19 Java provides no support for taking such copies – just copying an object reference is not sufficient,
20
21
22
23
}
if (space<=0)
status=false;
this.accept = true;
return status;
A FT
else if (space<=10240 && (space>=16 || this.special)) // typo
fault
28 }
The results of running the SpaceOrder equivalence partition tests are shown in
Figure 9.9.
The simple fault has been detected – as for any equivalence partition test, there is no
guarantee that any particular fault will be found.
For CS265/CS608 Students Personal Use Only
return false;
boolean status=true;
this.accept = false;
if (space<=0)
A FT
public boolean acceptOrder(int space) {
if (locked)
21 status=false;
22 else if (space<=1024 && (space>=16 || this.special))
23 this.accept = true;
24 return status;
25 }
DR
26
27 public boolean getAccept() {
28 locked = true;
29 return this.accept;
30 }
31
32 }
Support for a new attribute locked has been incorrectly added to the class. This
prevents acceptOrder() from working correctly if it is called more than once – the first
time it is called it corrupts the object. In essence, this introduces a new state Locked
which is not in the state diagram, and acceptOrder() does not work in this state. See
lines 5, 16, 17 and 28.
The results of running the equivalence partition tests in class context against this
fault are shown in Figure 9.10.
For CS265/CS608 Students Personal Use Only
Note that fault has not been detected. A more systematic exploration of state-based
behaviour is required to reliably detect such faults, as discussed in Section 9.4.11.
FT
The results of running the state-based tests (from Section 9.4.11) against the faulty
implementation of SpaceOrder are shown in Figure 9.11.
FAILED: allTransitionsTest
java.lang.AssertionError: expected [true] but found [false]
at example.SpaceOrderStateTest.allTransitionsTest(SpaceOrderStateTest.
A
java:22)
===============================================
Command line suite
Total tests run: 1, Passes: 0, Failures: 1, Skips: 0
===============================================
DR
The fault has been found. This systematic testing of the transitions is more likely to
find state-based faults than conventional testing in class context.
11 }
12
13 public int getTrackCode() {
14 return (int)code;
15 }
16
17 @Override
18 public boolean acceptOrder(int space) {
19 return true;
20 }
21
22 }
The method acceptOrder() has been overridden on lines 17-20. The implementation
on line 19 is incomplete. This is often seen where a empty of skeleton method is coded
initially, and the developer has forgotten to complete it.
The results of running the equivalence partition tests for SpaceOrder against an
Three of the SpaceOrder tests have failed when run against a TrackableSpaceOrder
instance.
Note that tests T1 and T2 can not be used as they call the constructor in SpaceOrder
which is not a normal method of the class, and is not inheritable!
The conventional tests may find state-based and inheritance faults, but they are more
likely to be found by systematic testing against those fault models.
9.7
and test data.
• Identifying the test coverage items based on the test design technique.
• Designing the test cases.
• Writing the test code.
Especially in an agile development environment, an experienced tester may use
the User Stories and acceptance criteria as the basis of OO tests. This is relatively
straightforward when the model that the classes implement matches the user problem
domain closely. In this case particular user actions can be easily matched to method
calls, and the sequence of interactions listed in an acceptance criteria for a user story can
be easily mapped to a sequence of method calls.
The experienced tester will also probably code tests directly from the state diagram
or inheritance tree, having made sure that the tests are designed to allow inheritance
testing.
For CS265/CS608 Students Personal Use Only
Chapter 10
Application Testing
In this chapter, we present the essential elements of testing a user application based
FT
on the example of a web application. The testing of desktop and mobile applications
is very similar – the key difference is the test automation tools, a topic we discuss in
Chapter 11.
Application testing has additional complexities in comparison to unit testing. The
key differences are: how to locate the inputs on the screen, how to locate the outputs on
the screen, and how to automate the tests running over the user interface.
A
10.1 Testing Web Applications with User Stories
Many different elements of an application’s specification can be used as the test basis for
application testing. Modern, Agile development processes focus on the user requirements
DR
expressed in the form of user stories. Larger systems may contain a number of stories
which can be grouped into collections called epics.
A stakeholder may be an end user, the sponsor of the project, a representative from
sales or marketing, etc.
The role refers to the part that the stakeholder is playing for that story.
Each user story is detailed with acceptance criteria (also called confirmations) which
the stakeholders will use to verify that the system has been designed and implemented
correctly. These acceptance criteria provide the basis for test cases for automated tests
for the stories.
As in previous chapters, we will first consider a small example, and then discuss some
of the underlying principles and issues in more detail.
10.2 Example
A fuel depot wants a web-based system to enable the dispatcher to determine whether a
load of fuel to be received at the depot will fit in a single tank. Low volatility fuels can
191
For CS265/CS608 Students Personal Use Only
fill a tank completely. High-volatility fuels require expansion space to be left for safety
reasons. The tank capacity is 1200 litres without the expansion space, and 800 litres
with it. All loads are considered to the nearest litre (decimal points in the fuel loads are
not to be supported).
Following conversations with the user, the following story has been developed:
Story S1: As a fuel depot dispatcher, I want to check if a fuel load fits in a tank so I
can decide whether to accept it or not.
And the following detailed acceptance criteria have been agreed with the customer:
S1A7 Identify a user input data error – this criterion was suggested to the customer by
the development team, based on their experience on web application design.
This means we have one user story with seven acceptance criteria.
10.2.1 Analysis
A
To develop the tests, we need to identify:
1. the different screens the application presents.
DR
Trial Runs
The user interface can be most easily investigated by using trial runs of the software to
determine how each story is achieved. An example of this follows for the Fuelchecker
application described above. Each screen in the trial is shown below along with a brief
explanation.
FT
• Figure 10.2c – After clicking on Continue the Enter Data screen is displayed, and
the user enters 1000 for Litres and selects High Safety Required .
• Figure 10.2d – After clicking on Enter the Results screen is displayed, with the
message "Fuel does not fit in tank."
A
DR
• Figure 10.3e – After clicking on Continue the Enter Data screen is displayed, and
the user re-enters 1000 for Litres.
• Figure 10.3f – After clicking on Enter the Results screen is displayed, with the
message "Fuel fits in tank."
For CS265/CS608 Students Personal Use Only
FT
• Figure 10.4g – After clicking on Continue to return to the Enter Data screen,
the user enters xxx for Litres, and clicks on Enter . The Results screen is then
displayed, with the error message "Invalid data values."
• Figure 10.4h – After clicking on Continue to return to the Enter Data screen, the
user clicks on the Exit link at the bottom of the screen. The Goodbye screen is
now displayed, with the message "Thank you for using FuelChecker."
A
DR
From these trial runs we can now identify each page displayed by the application, the
interface components required for testing, and the data representation being used.
and output elements on each web page. Without these, automated testing is much more
difficult1 .
Most web browsers include an inspector that allows the element id (and other
information) to be examined in the web browser2 . Three examples of this are shown
below.
Example 1: Figure 10.5 shows the Element tab displayed by Chrome when the user
right-clicks on the input textbox for litres and selects Inspect from the pulldown menu.
• the id (id=”litres”)
FT
The important details of this element for the tester are:
• the html element type (<input type=”text”>)
Example 2: Figure 10.6 shows the inspector information for the Enter button on
the same web page.
A
DR
1 In a Test Driven Design (TDD) environment, tests may be developed as soon as the screen layout
has been designed, using id’s selected by the Graphical User Interface (GUI) designer or the tester. The
code would then use these id’s in order to pass the tests.
2 Alternatively, the page source may be viewed in the browser.
For CS265/CS608 Students Personal Use Only
In cases where displayed text is not within a container with an id, then a higher level
container may be used3 . For example, the lines of displayed text for the information
screen are contained within <p> elements, which are in turn contained within a <span>
element with no id. The <span> element is in turn contained with a <div> element
which has an id "body".
The important details of this element for the tester are:
• the html element type (<div>)
• the id (id=”body”)
FT
Using the inspector, the HTML element type and id of all the necessary web elements
are determined, as listed in Table 10.1.
A
Table 10.1: HTML Element Information
The input elements for litres and highsafety are disabled initially. They are then
dynamically enabled as required by the application, producing the screens shown in the
trial run.
3 HTML elements can be nested, so if the element or container you want to refer to has no id, you
Data Representation
The data representation used for the inputs and outputs is determined by examining the
HTML elements and their appearance on the screen.
• The input highSafety is a checkbox element. This represents a boolean value.
• The input litres is a text input element (a string). This represents an integer value.
• The output result is a non-editable text element (a string). The text can take one
of three possible outputs:
– "Fuel fits in tank."
– "Fuel does not fit in tank."
– "Invalid data entered."
• The output body of the information screen is also a non-editable text element
(a string). The content of body is a complex HTML expression (as shown in
FT
Figure 10.7). We are not testing the text formatting is correct, we just need to
check that this element contains the correct text. The other HTML elements can
be ignored. The correct text contains the two important phrases:
– "Standard tank capacity: 1200 litres"
– "High safety tank capacity: 800 litres"
A simple user story test case will use a single data value for each acceptance criterion.
A
In more advanced testing, the analysis can be extended to identify equivalence partitions,
boundary values, and combinations for testing.
The analysis is now complete – we have identified each screen displayed by the
application, each web element on each screen required for testing, and the data
DR
US2 S1A2
US3 S1A3
US4 S1A4
US5 S1A5
US6 S1A6
US7 S1A7
4 We have one user story, with seven acceptance criteria. The TCI identifiers US1–7 are arbitrary
unique identifiers, with the prefix US selected to indicate that these are User Story tests.
For CS265/CS608 Students Personal Use Only
Even though US7 reports an error back to the user, it is not an error case in the same
sense as in EP testing. Each user story/acceptance criteria is tested separately, so the
issue of error hiding does not apply, and we do not need to identify error cases with an
asterisk.
TCI Input
litres
litres
litres
Value
"1000"
"400"
"2000"
US4 litres "1000"
US7 litres "xxx"
DR
There are many possible invalid strings that may be entered for an integer value in US7
– see the discussion in Section 10.4.5.
The expected results (the correct outputs, and their data representation) have already
been identified during analysis of the application, as shown in Table 10.4.
Test case T1 is shown in Table 10.5. The inputs consist of actions: in this example,
there are data values to be typed in, checkboxes to be selected/deselected, and buttons
to be clicked. The expected results consist of responses by the web application: in this
ID
T1
A
TCI
Covered
US1
Inputs
FT
example, there are window titles and displayed text to be verified.
Deselect highsafety
Click on Enter Moved to Results screen
Result is "Fuel fits in tank."
DR
The data for the other tests is developed the same way, specifying the inputs to the
application, and the expected results. See Table 10.6 for the full set of test cases.
For CS265/CS608 Students Personal Use Only
T4
T5
US4
fuelchecker."
T7 US7 Enter "xxx" into litres
Select highsafety
Click on Enter Moved to Results screen
Result is "Invalid data values."
10.3.1
A
Implementation
FT
into a web application, and to collect the output for verification, a web automation test
library must be used. A good and widely used example is Selenium5 . This is used in the
test implementation as a representative example to demonstrate the principles of test
Only the code for the test methods themselves are shown here. The complete test
program requires additional methods to configure Selenium and open the web browser,
which are shown in Section 10.4.
DR
Test 1
The implementation of the first test case (T1) is shown in Listing 10.1 .
71 wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("Enter")));
72 driver.findElement(By.id("Enter")).click();
73 wait.until(ExpectedConditions.titleIs("Results"));
74 wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("result")));
75 assertEquals( driver.findElement(
By.id("result")).getAttribute("value"),result );
76 wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("Continue")));
77 driver.findElement(By.id("Continue")).click();
78 wait.until(ExpectedConditions.titleIs("Fuel Checker"));
79 }
• Web-based tests require a timeout: in case the browser does not respond, or the
FT
test hangs indefinitely waiting for a specific response. In this test, a timeout of 60
seconds is selected – the value depends on connectivity and is contextual, and may
require a few test runs – line 60.
• First the test makes sure the browser is on the correct screen. Where web page
titles are used, this can be best achieved by checking the title – line 65.
• Next the value for litres must be entered. The browser may not have finished
rendering the window, so the test must wait for this element to appear, and then
it can simulate user entry using sendkeys() – lines 66 and 67.
A
• Each HTML element used in the test is found by calling the method By.id() – see
for example line 67.
• The highsafety checkbox must be deselected. To do this, the current value of the
checkbox is checked (line 69), and if it is already selected, then it must be clicked
DR
FT
The time the test was started at, and the URL, are printed by the test code (see
Listing 10.6 later in the chapter for details.). Their values will therefore change depending
on your configuration and when the test is run. The WebDriver startup and connection
information confirms that the web browser has started properly, and a session to the
browser has started. These details are not important to the test result.
The test has passed.
A
Adding Tests 2-4 and 7
Test 1 can be run on its own, but adding further tests requires decisions about how the
sequence of tests is to run, and how code duplication can be avoided. Restarting the
DR
web application each time is very slow, so it is usual to run the tests in sequence. This
requires that each test leaves the application on a selected screen. For this application, it
is easiest to always return the application to the Fuel Checker screen at the end of each
test. Code duplication can be avoided, as in unit testing, by using parameterised tests.
Test cases T1, T2, T3, T4 and T7 have exactly the same structure , but use different
data. This is an opportunity to use a DataProvider, as shown in Listing 10.2.
82 };
83 }
84
85 @Test(timeOut=60000, dataProvider="testset1")
86 public void testEnterCheckView(String tid, String litres, boolean
highsafety, String result) {
87 wait.until(ExpectedConditions.titleIs("Fuel Checker"));
88 wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("litres")));
89 driver.findElement(By.id("litres")).sendKeys(litres);
90 wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("highsafety")));
91 if (driver.findElement(
By.id("highsafety")).isSelected()!=highsafety)
92 driver.findElement( By.id("highsafety")).click();
93 wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("Enter")));
94
95
96
97
98
By.id("result")));
assertEquals( driver.findElement(
FT
driver.findElement( By.id("Enter")).click();
wait.until(ExpectedConditions.titleIs("Results"));
wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("result")).getAttribute("value"),result );
wait.until(ExpectedConditions.visibilityOfElementLocated(
A
By.id("Continue")));
99 driver.findElement( By.id("Continue")).click();
100 wait.until(ExpectedConditions.titleIs("Fuel Checker"));
101 }
Test cases T5 and T6 have different structures, so require individual tests, as shown
DR
115 );
116 wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("goback")));
117 driver.findElement(By.id("goback")).click();
118 wait.until(ExpectedConditions.titleIs("Fuel Checker"));
119 }
129
130 }
By.id("body")));
assertTrue(driver.findElement(
FT
wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("body")).getAttribute("innerHTML").contains( "Thank
you for using FuelChecker."));
Making sure that a test leaves the application at the main screen, even if the test fails,
A
requires a method to be run after each test. This is shown in Listing 10.5 which uses the
TestNG @AfterMethod annotation to require that returnToMain() is run immediately
after each @Test method6 .
58 @AfterMethod
59 public void returnToMain() {
60 // If test has not left app at the main window, try to return
there for the next test
61 if ("Results".equals(driver.getTitle()))
62 driver.findElement(By.id("Continue")).click();
63 else if ("Fuel Checker Information".equals(driver.getTitle()))
64 driver.findElement(By.id("goback")).click();
65 else if ("Thank you".equals(driver.getTitle()))
66 driver.get( url ); // only way to return to main screen from
here
67 wait.until(ExpectedConditions.titleIs("Fuel Checker"));
68 }
• If the application is left at the Thank you screen after a failure, there is no link or
button for the user to click to return to the main screen. The @AfterMethod code
reloads the main application url to handle this – see lines 65 and 66.
createSession
INFO: Detected dialect: W3C
FT
[1600971316.989][WARNING]: This version of ChromeDriver has not been tested with
Chrome version 85.
Sep 24, 2020 7:15:18 P.M. org.openqa.selenium.remote.ProtocolHandshake
===============================================
Command line test
Tests run: 7, Failures: 0, Skips: 0
A
Figure 10.9: Fuelchecker User Story Test Results
A generic test model for system testing is shown in Figure 10.10. The Test Tool
provides inputs to the Test Item, and receives outputs from the Test Item, over the
system interface. In some cases the interface will be synchronous, where every input
generates an output. In other cases the interface will be asynchronous, where an input
A test model for desktop applications is shown in Figure 10.11. The Test Tool
interacts with the Desktop Application through a Windowing Interface (e.g. AWT,
Swing, or JavaFX in the case of Java) by emulating a user. When a user interacts with
the screen, the Windowing Interface software generates what are referred to as GUI
Events, and the Test Tool generate the same GUI Events. The responses are in turn
delivered back to the Test Tool via the Windowing Interface, and these are referred to
as GUI Responses.
For CS265/CS608 Students Personal Use Only
(a) Browser Web Test Model (b) Direct Web Test Model
Models for web application testing are shown in Figures 10.12a and 10.12b. Web
applications use a Web Browser to provide the user interface, and the communication
between the browser and the application running on the Web Server is via the Hypertext
Transfer Protocol (HTTP) over the network. In Figure 10.12a the system is tested via
FT
a Web Browser, and the Test Tool emulates the actions of a user. In Figure 10.12b the
system is tested using HTTP directly over the network interface. The Test Tool in this
case must generate the HTTP messages itself. There are utilities and libraries that can
be used to generate HTTP requests and parse HTTP responses for this purpose (e.g.
curl, Beautiful Soup, etc.). This generally executes much faster, and allows testing to
be independent of the web browser; however, it is more complex to implement the tests,
requiring a deep understanding of HTML and HTTP, and does not guarantee correct
operation when running against a web browser. Also, if the application is partially
A
implemented on the browser using JavaScript, then the test tool must support this (which
is a substantial task).
System testing and integration testing often both take place over the same interface (the
system interface). But it is important to note that the purpose of each is different. System
tests verify that the system as a whole is working correctly. Integration tests verify that
some components (or sub-systems) of a system are working correctly together.
Integration Testing
Integration Testing takes a number of forms depending on the software process. In the
traditional software process, based on producing layers of software, integration takes
places between these layers. Depending on the process, this may take the form of Top-
Down Integration Testing, Bottom-Up Integration Testing, or Feature Testing, which are
explained below. In a modern Agile process, adding new user features typically involves
changes to all the layers. Integration testing in this type of process often involves making
sure that the old and new user features integrate correctly together – this is sometimes
referred to as Testing End-to-End Functionality.
Integration testing may also take place in a way similar to unit testing, where the
purpose of the testing is to verify that two software components (usually represented by
classes) work correctly together.
For CS265/CS608 Students Personal Use Only
FT
Figure 10.13: Test Drivers and Stubs
When the software being tested requires another software component that has not
yet been implemented, temporary software (referred to as stubs) is written. This stub
software has limited functionality – often it will provide just enough support for the tests
to execute. Stubs may also be used to speed up tests by avoiding slow networking calls,
or to prevent actual actions taking place (such as sending emails or modifying an active
A
database).
Stubs may include instrumentation (e.g. counters added to the stub code) to measure
how much of the temporary code has actually been called, or assertions7 to verify correct
operation. These instrumented stubs are often referred to as mocks. Testing with mocks
DR
(b) Bottom-Up
(a) Top-down
In top-down testing, the top layer of software (Level 1 in the diagram) is tested first
with the underlying layer (Level 2) replaced by stubs. At the next stage of testing, the
Level 2 Stubs are replaced by the next layer of software (Level 2), and the new underlying
layer (Level 3) is replaced by stubs. This continues until all the stubs have been replaced
7 See Built-in Testing in Section 9.4.13.
8 For example: EasyMock, JMock, JMockit, Mockito and PowerMock.
For CS265/CS608 Students Personal Use Only
by the operational software. All the tests use the top level interface (which may be
an API or a user interface) as presented by Level 1. Only one Test Driver is required,
though it may require additions to make sure that the progressive integration with the
underlying software is thoroughly tested.
Bottom-up testing is the opposite: it begins with the Level 3 Test Driver testing the
lowest layer of software (Level 3). Then the next layer is introduced (Level 2), and a
new Test Driver written to test this (L2 Test Driver). This continues until all the levels
of the software have been integrated and tested. In contrast to top-down testing, the L3
Test Driver will use the Level 3 API – but the final L1 Test Driver will use the Level 1
API. No stubs are required, but multiple test drivers must be written.
A hybrid approach, sandwich testing, reduces the number of stubs required, by testing
just three layers at a time: the focus of testing in this case is the middle layer.
As an alternative to these layered approaches, most modern development processes
focus on developing increments of end-to-end user functionality – for example, each
deliverable feature in the product backlog (See Chapter 13). The order of testing as
these features are added incrementally is shown in Figure 10.15.
A FT
DR
In practice, system tests (or application tests) are often used to act as integration
tests, with mocks used to verify the integration. Developing tests that throroughly
exercise each interface is time consuming, and techniques for developing test coverage
items for integration testing are still being researched – see Chapter 14.
• User Requirements Faults This occurs where the actual responses from the
application do not match the expected responses as specified in the associated user
story.
However, it is unusual for the user stories to cover the full functionality of the interface.
There is a second level of the design hierarchy to be considered in testing an application.
When using an application, a user will navigate between different screens, each screen
contains multiple interface elements, and these elements interact with underlying software
features. In an MVC (Model-View-Controller) design, navigation may be associated with
For CS265/CS608 Students Personal Use Only
the Controller, screen contents with the View, and the underlying software features with
the Model. This leads to three functional fault models for an application9 :
• Navigation Faults This occurs where the navigation between the different
interface components (in the form of windows, screens, pages, forms, etc.) does
not work correctly. The application may display the wrong screen, or may ignore
the user action and do nothing.
• Screen Element Faults Each interactive interface component of the system, with
its corresponding inputs and outputs, has expected behaviour. For example: a
button should perform some action when clicked, a text input box should accept
typing, and a hyperlink should navigate to another page. It is an element behaviour
fault when an element does not exhibit the expected behaviour. In the case of a web
application, much of the expected behaviour may be implicit and not documented.
For example, the designer might assume that a user will type their name into a
textbox with the prompt "Username" and not specify exactly how the textbox is
expected to behave in detail.
FT
• Software Feature Faults Where the software features of the system are
documented, these can be tested independently of the user stories. Typically a
feature is a set of interactions with the system that result in a particular output.
A user story typically involves several features. A feature can be regarded as a
method or class, and the black-box techniques of equivalence partition, boundary
value analysis, and decision tables applied to testing each.
A
Some of these fault models may overlap: for example, if an HTML anchor link is
not implemented correctly, it will not behave correctly when clicked (Element Behaviour
Fault), leading to the next screen not being displayed (Navigation Fault), which may
DR
cause the feature being used to not display its outputs (Feature Fault), and thus the
application will not satisfy the acceptance criteria for the associated user story (User
Requirements Fault). User stories attempt to catch most of these faults by testing that
the user can complete the required tasks.
10.4.5 Analysis
The key analysis task in application testing is the analysis of the user interface. In
unit testing, a method call requires no analysis: the parameters, their types, and order
are all well defined. However, for a user interface, the inputs and outputs are located
on a screen, and are represented using text (or other user interface metaphors for the
underlying data types). The location of these interface elements may change based on
the screen size or orientation, and the user must interact with the application using other
interface elements (such as keyboard shortcuts, buttons, links, etc).
Designing automated tests for an application involves finding a way to (a) locate the
required interface elements, and (b) enter/extract data in a way that is compatible with
the data representation used. For example, numeric data will often be represented as
text on the screen. This involves several transformations:
9 There are a wide range of appearance and performance related fault models also which are not
included here.
For CS265/CS608 Students Personal Use Only
• When the user enters a number, each numeric key pressed is interpreted as a digit,
and these digits are added to the end of a string displayed as feedback to the user
on the screen. This string must then be converted to an integer value for use in the
program. The test tool needs to convert integer inputs to strings (or key presses)
in order to provide the input.
• When an integer value is to be shown to the user by a program, then it will first
be converted to a text string for display. The test tool needs to retrieve this string,
and convert it to an integer, before checking its value.
• Numeric inputs and outputs may also use different screen elements as metaphors
for the value. These may include dials, sliders, pull-down menus, etc. In each
case, the test tool must manipulate these screen elements to provide inputs to the
program, and must convert from the displayed representation to check the outputs
from the program.
The difference between interacting with a software application via its user interface,
FT
and calling a method via its programming interface, is discussed below.
• The input for whether the fuel is highly volatile and requires extra space for safety
is passed in the parameter highSafety.
• The output is returned in the method return value.
• If an error occurs, then the method will raise a FuelException.
• The method is identified by its name – in Java the full name includes the package
name and the class name: example.FuelChecker.check()
Writing a program to interact automatically with this method is straightforward for
an experienced programmer, and unless the parameter types are very complex, requires
no further analysis.
The user interface for an application that uses this method to check whether a fuel
load will fit in a tank or not is shown in Figure 10.16.
For CS265/CS608 Students Personal Use Only
• The inputs for the volume of fuel, and whether extra safety space is required, are
located on the screen.
• The result is shown in a separate field, in this case it is below the entry fields. It
may have a prompt, or a title, or may just be identified by context.
A
The screens may be identified by a title at the top of the page (here we see Fuel
Checker and Info Window), by a name external to the screen (for example, web browser
tab names), or by content.
The input and output parameters are represented by text on the screen and
DR
checkboxes in this example. And the question arises of what are the valid representations
of numbers as text, and what is the interpretation of the checkbox. These issues are
discussed under data representation below.
Writing a test program to interact automatically with this application is complex:
• Interacting based purely on the absolute (x,y) location of each element on the
screen can prove problematic. This fails if the layout changes, which is a particular
problem for responsive interface design10 .
• Using the prompt text can also prove problematic. The prompt is usually located
by various conventions – for example, the prompt for a dial may be within the
element, the prompt for a text box may be to the left of or above the textbox, or
even in the textbox in a grey font. It may also prove difficult to locate non-text
elements by prompt – for example, a warning message may popup an icon on the
screen such as a warning flag.
These user interface elements may be easy for a user to locate and interpret, based
on prompts, convention, or just experience. But locating them, and interpreting the
10 Responsive interfaces will dynamically alter the screen layout to suit the size and orientation of a
screen. For example, as a mobile phone is rotated, the position of the screen elements may be modified
to suit the new orientation. Sophisticated versions may also alter the contents of the screen, for example
removing large logos, or summarising text and prompts when displayed on a small screen.
For CS265/CS608 Students Personal Use Only
data representation, provide significant challenges for test automation. Unless there is
detailed documentation available, specifying the interface in detail, this requires further
investigation before black-box techniques can be used to select test data.
An application may use a wide range of on-screen interface elements (such as pulldown
menus, popup menus, sliders, drag-and-drop, keypad gestures etc.), and even off-screen
interfaces such as audio, visual, and haptic interfaces; these are significantly harder to
test!
FT
The Selenium library is representative of other web automation tools in that it
provides a number of methods to find HTML elements and interact with them.
example shown in Figure 10.17, where a text string is used to represent an integer. There
are a surprising number of ways in which a string can be formed that can or cannot be
converted to an integer in a program – some of the following analysis is Java-specific, and
some is common to many languages. The term valid strings is used to indicate strings
which can be converted to an integer value by using a single standard Java library call12 .
11 By.ByXPath() is very useful where HTML id’s are not defined, as elements can be found
by their content. However, its use is far from straightforward. Refer to the Selenium
documentation at https://www.javadoc.io/doc/org.seleniumhq.selenium/selenium-api/2.
50.1/org/openqa/selenium/By.html for details.
12 Integer.parseInt() or Integer.decode()
For CS265/CS608 Students Personal Use Only
Strings
Valid
Decimal string
Examples: "44", "-33"
Non-decimal string (hex, octal)
Examples: "0xFFP2", "033", "#bad"
Invalid
Invalid number
Examples: "seven", "10+33", "xxx", "45 //comment"
Invalid integer
Too large (maximum is "2147483648")
Example: "300000000000"
Floating point
Examples: "44.0", "44.5", "7.", "7E+2"
FT
Strings with whitespace (e.g. ” 44 ”) cannot be converted by either of these integer
conversion methods without pre-processing – typically String.strip() might be called first
to do this. If an application calls for non-standard integers to be allowed (e.g. in a format
not supported by the standard Java String conversion methods), then the programmer
must implement custom conversion code, and the tester should test that these integers
are handled correctly.
A
What is important is not what method has been used by the programmer to do the
conversion, but what the application should support. Ideally, this will be specified as
part of the application. Otherwise, the customer may need to be asked, or an interface
decision made.
DR
If unit testing has not been performed beforehand, application testing using equiv-
alence partition/boundary value analysis/decision tables may be used as a substitute.
This is not as rigorous as testing individual methods or classes via their programming
interface. If this form of testing is used, then the equivalence partition/boundary value
analysis/decision tables technique will identify the test coverage items (rather than the
user stories/acceptance criteria).
Important note: in addition to the user stories agreed with the customer, the
tester will probably also identify a large number of extra test coverage items using the
equivalence partition, boundary value analysis, decision tables, and OO test techniques,
and experience-based testing, as discussed previously.
Suggested further readings are presented in Section 14.3.
Implementation FT
especially for errors (as seen in the discussion on data representation).
10.4.8
The basic structure of a TestNG/Selenium test is outlined in Snippet 10.2.
A
DR
For CS265/CS608 Students Personal Use Only
@BeforeClass
public void setupDriver() throws Exception {
// This runs before any other method in the class:
}
open web browser
@AfterClass
public void shutdown() {
FT
// This runs after all other methods:
close the web browser
}
A
// Tests go here
@Test(timeOut=20000)
DR
@AfterMethod
public void postMethodProcessing() {
// Runs after each test method:
return to a common start screen
}
56 }
FT
In practice, frequent reference to the HTML DOM and HTML specifications17 and
the Selenium API documentation18 is needed to find the necessary API calls, and to
ensure that HTML elements are being accessed correctly.
Sometimes a tester may need to perform trials with different Selenium calls or different
attributes to make sure the test is working correctly. When developing tests, it can be
useful to print different attributes of an element, or the results of different Selenium
methods, to the console for inspection.
A
10.4.10 Test Output Messages
There are two elements of the test output that we have not considered so far:
16 Defined
by the W3C – see https://dev.w3.org for details.
17 Seehttps://www.w3.org for details.
18 https://www.selenium.dev/documentation/en/
For CS265/CS608 Students Personal Use Only
It is particularly valuable for application testing (and other system tests) to record
the date and time of the test, and also the details of the application being tested (the test
item). In this example the test item is a URL. These are printed by the @BeforeClass
method of the test program – see lines 35 and 38 in Listing 10.6.
The ChromeDriver messages show the version running, a warning that only local
connections are allowed by Chrome when running under the control of ChromeDriver,
and confirmation that ChromeDriver has successfully opened a connection to the Chrome
browser.
FT
These tools generally record the user input automatically, but require the user to
identify the important output fields manually (the tool has no way to tell which parts of
the screen are important for a correct response, and which are not). The data in these
important fields can then be recorded automatically. When the test is rerun (playback),
the same user input is provided, and the tool checks that the important output fields
contain exactly the same values as previously.
Most web-based test tools, such as Selenium19 , provide such a facility. Simple editors
A
may allow some customisation (for example, date fields may change every time the test
is run). But testing stories usually requires a more programmatic approach, as shown in
the example in this chapter.
DR
10.5 Evaluation
In a well-established development process, the classes that implement the fuel checker
functionality should have been unit tested in advance. So it is likely that only user-
interface related faults will cause failures of the system test. The limitations of user
story testing are explored in the following sections.
10.5.1 Limitations
Some example faults are explored in this section, based on the following faults types:
• Navigation fault: a fault is inserted that prevents the application moving from one
screen to another.
• Screen Element fault: a fault is inserted into a screen element.
• Software Feature fault: a fault is inserted into the software feature to check the
fuel load.
The application is implemented in HTML and Javascript, and can be viewed in the
file Fuelchecker.html.
Navigation Fault
A fault is inserted into the navigation from the Results screen to the Exit screen in the
fuelchecker application.
Two extracts from the correct code are shown here. Listing 10.8 shows where the
exitlink href is set to call the Javascript Exit() function when the hyperlink Exit is clicked
on the screen.
When the application switches to the results page, the correct code to implement this
is shown in Listing 10.9.
On line 140, the exitlink href is incorrectly removed during the transition to the results
page. As a consequence, nothing happens when the hyperlink Exit is subsequently clicked
on the screen.
The results of running the user story tests against the application with this fault
inserted are shown in Figure 10.19.
For CS265/CS608 Students Personal Use Only
FT
Total tests run: 7, Passes: 7, Failures: 0, Skips: 0
===============================================
All the tests have passed – this particular link is never used after the results page is
A
displayed in the user stories. Navigation testing can be used to ensure that every link,
or other action that should cause a page transition, works on every screen.
A fault is inserted into the handling of the screen element highsafety in the fuelchecker
application – it is disabled when it should be enabled when returning back to the main
screen.
Listing 10.11 shows a snippet of the correct code.
When the Continue button is clicked, then the application returns to the main screen
with the highSafety checkbox re-enabled – line 155. The user can now select/deselect
the checkbox as required for the next data entry.
Listing 10.12 shows the code with a fault inserted.
151 document.getElementById("Enter").style.display=’block’;
152 document.getElementById("Continue").style.display=’none’;
153 document.getElementById("result").style.display=’none’;
154 document.getElementById("litres").disabled = false;
155 document.getElementById("highsafety").disabled = true; // Screen
element fault
156 document.title = "Fuel Checker";
157 document.getElementById("subhead").innerHTML = "Enter Data";
When the Continue button is clicked, then the applications returns to the main
screen with the highSafety checkbox disabled – line 155. The user is now unable to
select/deselect the checkbox as required for the next data entry.
The results of running the user story tests against the application with this fault
inserted are shown in Figure 10.20.
A FT
DR
For CS265/CS608 Students Personal Use Only
FT
FAILED: testEnterCheckView("T2", "400", true, "Fuel fits in tank.")
java.lang.AssertionError: expected [Fuel fits in tank.] but found [Invalid data
values.]
at example.FuelCheckerWebStoryTest.testEnterCheckView(
FuelCheckerWebStoryTest.java:97)
Four of the tests have failed. The tests T1, T2, T3 and T4 use the highsafety button
after returning to the main screen, as and the faulty code does not re-enable this button,
the tests fail. The tests T5 and T6 do not use the highsafety button. Test T7 uses the
highsafety button, but does not rely on it working properly. High impact faults like this
are likely to cause multiple user story tests to fail, but more subtle faults may not be
found by simple user story tests.
104
ls = "Invalid";
FT
Listing 10.14 shows the faulty code. After the data conversion to an integer has
correctly taken place, a multiplication factor of 10 is incorrectly introduced, causing the
wrong data values to be used in the Check a fuel load feature.
The results of running the user story tests against the application with this fault
inserted are shown in Figure 10.21.
For CS265/CS608 Students Personal Use Only
FT
at example.FuelCheckerWebStoryTest.testEnterCheckView(
FuelCheckerWebStoryTest.java:97)
===============================================
Command line test
Tests run: 7, Failures: 2, Skips: 0
===============================================
DR
Two of the tests fail. The software feature still works for some input values, but not
for others. In many cases, this type of fault can create very subtle changes in behaviour
that would not be caught by a simple user story test.
but otherwise the absolute or relative position on the screen may be the only
way to locate them. As for unit testing, the input may also come from external
sources.
FT
for input data errors. Additional tests will also be developed based on the experience of
the testers (see Section 1.7.3) in order to try and overcome some of the weaknesses in
basic user story testing.
The time spent on developing and executing tests is usually closely related to the value
of the application. This is based on the risk of failure as discussed in Chapter 1. On
an e-commerce website for a small company selling low value items to a few customers,
the code probably will not be unit tested. The basic user stories of listing the items,
adding them to a shopping cart, checking out, and viewing the order status will probably
A
be tested manually with no formal documentation. At the other extreme, a website
for a bank will probably have all the code unit tested. And there will probably be
extensive automated application testing (including user story testing, and some of the
other approaches discussed) to make sure that the bank’s customers are unlikely to
DR
experience problems.
For CS265/CS608 Students Personal Use Only
Chapter 11
Test Automation
Manual testing is slow, error-prone, and hard to repeat. Software testing needs to be
fast, accurate, and repeatable. The solution is test automation. This chapter provides
11.1 Introduction
FT
insight into the process of automated testing and relates to the test design techniques
that have been discussed in previous chapters.
Software testing needs to be fast, so that it can be performed frequently without a time
A
penalty. It needs to be accurate so that the test results can be relied on as a quality
indicator. And it needs to be repeatable to allow for regression testing, where the same
tests may be run many times for different software versions.
This is particularly true for modern agile development approaches, where small
increments of functionality are added and tested in a rapid cycle.
DR
Some of the testing tasks that can be automated relatively easily are:
• Execution of tests.
• Collection of test results.
• Evaluation of test results.
• Generation of test reports.
• Measurement of simple white-box test coverage.
Some of the testing tasks that are more difficult to automate are:
• Generation of test conditions, test coverage items, and the data for test cases.
• Measurement of black-box test coverage.
• Measurement of complex white-box test coverage.
Unit test execution is invariably automated. This is implemented by writing code
that calls the required methods with the specified test input data, and compares the
actual results with the expected results.
Application tests are more difficult to automate. In manual testing, the correctness
of the output on the screen can be left to the tester’s judgement. Automated testing
requires that the details of how the expected results are displayed are known.
Application tests are also more complex to automate, as they depend on the details of
the system interface: inputs are provided and results collected via the system interface. It
229
For CS265/CS608 Students Personal Use Only
generally takes extra time to develop automated application tests. Not only do the tests
themselves have to be implemented, but there is an additional and complex program
interface library to be used. In general there is a shift from manual to automated
application testing. A rule-of-thumb is that if a test is to be executed more than twice,
it is worthwhile using test automation.
Automated tests can be grouped into collections (which are also referred to as test
sets or test suites), and the results automatically collated into a report. The report
includes an overall summary of the test result (pass or fail). It may also include statistics
on the tests that have been run. It will also include a test incident report on each failure,
providing exact details on why the test failed in order to assist in locating the fault.
In this chapter, automated unit testing is examined in more detail, using TestNG1 as
an example test framework (as used in Chapters 2 to 6). Automated application testing
is also examined in more detail, based on TestNG to manage the tests, and Selenium2 to
interface the tests with web-based applications. It must be emphasised that these are just
example test tools, representative of the typical features to be found in test automation
tools.
Tests are usually skipped because the test setup failed: perhaps the class file for the code
being tested was not in the expected location, or the web driver was the wrong version
for the web browser in use. Skipped tests generally require a response by the tester to
rerun them.
1 See https://testng.org
2 See https://www.selenium.dev
For CS265/CS608 Students Personal Use Only
It is useful to maintain an edit history of the test file (this could be automated under a
version control system, such as Git). There is no need to limit a file to one particular type
of test: a test file might include tests derived using multiple techniques (e.g. equivalence
partition, boundary value analysis, decision tables, etc.). Or, alternatively, each might
be put into its own file. Often black-box and white-box tests are separated into different
files, as the black-box tests remain valid, even if the implementation changes, but the
white-box tests do not. Test runners, such as TestNG, generally provide a number of
ways to select different test methods for execution from different test classes.
to Git).
11.2
A
Test Frameworks: TestNG
FT
For a class, the identifier is generally the version number from the version control
system. For complete systems, this is generally the build number from the build
procedure (or it may be the release tag, or the date for revision control systems similar
In this book, TestNG is used as a representative unit test automation framework. Other
frameworks have similar features – this book does not provide a full description of
TestNG, but rather uses TestNG to introduce and explain typical features required for
automated testing. Only a limited subset of the TestNG features are described in this
DR
3 https://testng.org
For CS265/CS608 Students Personal Use Only
1 package example;
2
3 import static org.testng.Assert.*;
4 import org.testng.annotations.*;
5
6 public class DemoTest {
7
8 @Test
9 public void test1() {
10 Demo d = new Demo();
11 d.setValue(56);
12 d.add(44);
13 assertEquals( d.getValue(), 100 );
14 }
15
16 }
as follows:
line 10.
FT
The key features of a TestNG test method, as shown in test1() in the example, are
• The object is initialised, or put into the right state for testing. In our example by
initialising the object with the value 56 – line 11.
A
• The method under test is now called. In our example, we call the method add()
with the input test data value 44 – line 12.
• The output data is now collected. In our example by calling getValue() – line 13.
DR
• Finally, the actual results (output value) is compared with the expected results by
using the TestNG assertEquals() method. In our examples the expected results are
the value 100 – line 13.
The output from running this test is shown in Figure 11.1.
PASSED: test1
===============================================
Command line test
Tests run: 1, Failures: 0, Skips: 0
===============================================
===============================================
Command line suite
Total tests run: 1, Passes: 1, Failures: 0, Skips: 0
===============================================
FT
to fail. This is why it is not a good idea to have multiple tests in a single test method:
if one fails, then the subsequent tests are not run. This does not apply to parameterised
tests – see Section 11.5.
The test methods are identified by a TestNG Test Runner which uses Java Reflection
to find all the methods in the test class with the @Test annotation, and then calls them in
turn, trapping exceptions, and keeping counters for the numbers of tests run, tests passed,
and tests failed. TestNG comes with a default command-line test runner, but TestNG
A
tests can also be run within an IDE (such as Eclipse). TestNG test execution can also
be managed using an XML file, which is passed to the test runner – see Section 11.3.1.
The best way to organize your test class files, and to ensure that all the code is tested, is to
have a test class for every program class. For example, class Demo in file Demo.java would
have a test class DemoTest in file DemoTest.java. You can use any naming convention
you like; however, in this book we use the convention of adding the word Test after the
class name.
In the test classes, individual tests are implemented as methods. In TestNG, each
test method must be public, and identified as a test method by using the Java annotation
(@Test) (see the example in Section 11.2.1). It is recommended that each test method
implement an individual test for two reasons:
1. The method terminates as soon as a test fails. This means that any subsequent
tests, in the same method, will not be executed. So, if you put multiple tests into
a single method, the test results cannot indicate whether just one test failed, or a
number of tests failed.
2. It is easier to find a fault if you know exactly which test failed.
The way to group multiple tests together is to put one test per method, and group the
test methods into suites of tests, sometimes referred to as test sets. To increase flexibility,
test suites may themselves be grouped into larger test suites. Note that parameterised
tests contain more than one test case per test method (see Section 11.5).
For CS265/CS608 Students Personal Use Only
24 </suite>
Figure 11.2 shows the complete output of running this. The TestNG parameter ‘log
level allows the tester to specify the amount of detail to include (1 is least detail, and 5 is
most). Here the value 3 has been used, which his allows us to see exactly which methods
were called.
5 The TestNG terminology is slightly different from the standard IEEE terminology that is used
===============================================
standardTest
Tests run: 1, Failures: 0, Skips: 0
===============================================
PASSED: test1
===============================================
extraTest
Tests run: 1, Failures: 0, Skips: 0
===============================================
=====
1332668132
FT
DemoTestExtra.test3()[pri:0, instance:example.DemoTestExtra@4f6ee6e4]
1332668132
The output shows the test methods called, and the results at each level of the hierarchy
A
defined in the XML file, but it is not straightforward to read. It closely follows the
definition in the XML file.
The instance addresses as shown in this high level of detail may be different every
time the test is run.
DR
The log level can be set to 1 to restrict the output to show the pass/fail test result
for the suite, as shown in Figure 11.3.
For CS265/CS608 Students Personal Use Only
PASSED: test1
PASSED: test2
PASSED: test3
===============================================
extraTest
Tests run: 3, Failures: 0, Skips: 0
===============================================
===============================================
Suite1
Total tests run: 3, Passes: 3, Failures: 0, Skips: 0
===============================================
===============================================
Suite1
Total tests run: 3, Passes: 3, Failures: 0, Skips: 0
FT
===============================================
Figure 11.3: Test Output using Sample XML File – Log Level 1
Using the XML file as a basis, the tester can then select just particular tests, test
classes, or test suites to execute, based on the names in the XML file.
A
11.4 Setup and Cleanup Methods
Test automation tools typically support ways to run particular additional methods before
and after every test class, or before and after every test method (or collections of test
DR
methods) in a test class. This allows objects (or connections to external software, such
as servers or databases) to be setup before a test, and cleaned up afterwards.
For example, if an object needs to be shared between all the tests in a class, it can
be created before all the tests are run, and re-initialised before every individual test.
TestNG provides annotations to support this. The use of @BeforeMethod is shown in
Listing 11.3. The test methods test1 and test2 each execute against individual instances
of Demo, as setup() is executed before every test method.
19 }
20
21 @Test
22 public void test2() {
23 d.setValue(200);
24 assertEquals( d.getValue(), 200 );
25 }
26
27 }
Executing this test results in the output shown in Figure 11.4 – with only the
important lines of the output shown.
===============================================
FT
Total tests run: 2, Passes: 2, Failures: 0, Skips: 0
A new Demo object is created for each test method – the test results produced by
A
TestNG are all reported at the end of the complete test, not after each test method is
run. But the println output is shown immediately. Which is why the output lines appear
to be out of order.
If the same object is used in every test in a test class, then the single object can
DR
be created before any of the tests are run using the @BeforeClass6 annotation (see
Snippet 11.1).
Objects can be cleaned up after all the tests have run using the @AfterClass
annotation. By annotating setup() and cleanup() as shown in the example, test1 and
test2 would each execute against the same instance of Demo.
@Test
public void test2() {
Demo d = new Demo();
A FT
16 d.setValue(0);
17 assertEquals( d.getValue(), 0 );
18 }
19 @Test
20 public void test3() {
DR
The output from running the inline tests is shown in Figure 11.5
PASSED: test1
PASSED: test2
PASSED: test3
===============================================
Command line suite
Total tests run: 3, Passes: 3, Failures: 0, Skips: 0
===============================================
Where different test methods require different code, inline test data may be used
without incurring the disadvantages of code duplication. This is shown in Listing 11.4
For CS265/CS608 Students Personal Use Only
where the test method test2() is slightly different from test3(). This is often the case for
system tests, where different tests require unique sequences of actions.
Most test frameworks include facilities for parameterised tests. This is invariably used
for unit tests, where the same method needs to be repeatedly called with different test
data. In TestNG the dataProvider parameter is used for this as shown in Listing 11.5.
Listing 11.5: Parameterised Tests
6 public class DemoTestParam {
7 private static Object[][] testData = new Object[][] {
8 { "test1", 56, 44, 100 },
9 { "test2", 0, 0, 0 },
10 { "test3", -1000, -1234, -2234 },
11 };
12 @DataProvider(name="testset1")
13 public Object[][] getTestData() {
14 return testData;
15 }
16
17
18
19
20
21
22
23 }
@Test(dataProvider="testset1")
}
d.add(y);
assertEquals( d.getValue(), er );
A FT
public void test(String id, int x, int y, int er) {
Demo d = new Demo();
d.setValue(x);
With a data provider, the method test() is called sequentially with each row of test
data in order as follows:
test( "test1", 56, 44, 100 );
DR
test( "test2", 0, 0, 0 );
test( "test3", -1000, -1234, -2234 );
The output from running the parameterised tests is shown in Figure 11.6
Note the slight difference in output – the name of the parameterised test method and
the values of the parameters are shown for each test executed.
Using Iterators
TestNG supports both static and dynamic data providers. Static data providers return a
fixed array (or an Iterator over a standard collection) and dynamic data providers return
a customised Iterator , which can generate data on-the-fly.
For CS265/CS608 Students Personal Use Only
Dynamic data providers can be written to provide test data which changes based on
the test progress, or to support very large data sets. An example is shown in Listing 11.6.
Instead of returning an array, the @DataProvider method returns an Iterator.
Each time the iterator next() method is called, it returns the next row of data. In
A
the example, a pre-intialised array is used, but the data could be dynamically generated
within the next method. This is particularly useful for random testing where data may
be generated on demand. The supporting DataGenerator class is shown in Listing 11.7.
DR
FT
untested components in the code, and achieving higher levels of coverage requires the use
of white-box testing techniques. Code coverage can be measured for any type of testing.
For Java there are a number of options. In this book JaCoCo is used as it is a good
example of what can be achieved.
JaCoCo Example
An example of statement and branch coverage results for giveDiscount() are shown
A
in Figure 11.8. The code includes Fault4 (see Chapter 4). The tests used were the
equivalence partition tests developed in Chapter 2 – selected to show the different
coverage features. The coverage report shows that full statement and branch coverage
have not been achieved.
DR
For CS265/CS608 Students Personal Use Only
A FT
DR
While the tool highlights the lines of source code in color on the screen, we present
these using different grey levels in the book. The details are as follows:
• Some lines of Java source code do not generate any executable code7 These lines
are not highlighted: lines 1–6, 8, 10, 12, 15, 18, 25, 28, and 30–32.
• Most lines have been fully executed8 . These are highlighted in light grey (or
green on the screen): lines 9, 13, 14, 16, 17, 20–23, and 29. This means that
the equivalence partition tests executed these lines.
• Line 26 has been partially executed, and is shown in a medium grey (yellow on the
screen). This means that there are branches on this line, and at least one of them
has not been taken. In this case, the branch from line 26 to line 27 has clearly not
been taken.
7 Java statements are converted into executable instructions before a program can be run. Some
statements produce no instructions: for example, comments, import statements, and closing braces.
Other Java statements may generate multiple instructions.
8 All of the binary instructions on that line of source code have been executed.
For CS265/CS608 Students Personal Use Only
• Lines 7 and 27 are in red: this indicates that they have not been executed. Java
has created a default constructor which has not been called (line 7). The tests have
not executed line 27.
• The diamonds in the left margin indicate lines that contain branches. In this
example, this indicates lines 16, 20, 22, and 26. Hovering the cursor on the diamond
gives additional information:
– Line 16, 20, and 22 (green) show popups with the text:
All 2 branches covered.
– Line 26 (yellow) shows a popup with the text:
1 of 2 branches missed.
The coverage results are also summarised in the coverage sumary report, as shown in
Figure 11.9.
A FT
Figure 11.9: Example JaCoCo Summary Coverage Results
The key figures here are the missed instructions and missed branches for the method
being tested: giveDiscount(). This shows that 5 out of 32 executable instructions have
DR
not been executed, and 1 of the 8 branches have not been taken.
Note that there are two completely different issues here: one is whether the tests have
passed or not, and the second is whether full coverage has been achieved.
Lazy Evaluation
Interpreting the statement coverage results when a line of source code is only partially
executed can require some thought. An optimisation feature called lazy evaluation may
cause only some of the boolean conditions in a complex decision to be executed.
In Java, the Conditional-Or (||) and Conditional-And (&&) operators are guaranteed
to be evaluated left-to-right, and the right-hand operand is only evaluated if necessary9 .
11.7 Timeouts
In general it is good practice to ensure tests do not continue to run for too long – for
example, due to an infinite loop or a deadlock in the code. This may be less relevant for
unit testing, but is critical for application testing.
In TestNG, the timeOut parameter is used. The test method test1() in Listing 11.8
fails if the test takes more than 1000 milliseconds to execute.
9 See The Java® Language Specification.
For CS265/CS608 Students Personal Use Only
11.8 Exceptions
Most Java test frameworks use exceptions to report a test failure. Therefore, if a method
being tested is expected to raise an exception, this must be handled differently (both
to prevent the test failing incorrectly, and also to verify that the exception is correctly
raised).
A simple example is shown in Listing 11.10 – the method DemoWithE.add(x) throws
an exception if x is not greater than 0.
A test for this method must verify that (a) an exception is not raised when x is valid,
and (b) that an exception is raised when x is invalid. When an unexpected exception
is raised, such as when an assertion fails, then the test fails. To notify TestNG that
an exception is expected, parameters to the @Test annotation are used, as shown in
Listing 11.11.
}
d.setValue(0);
d.add(44);
assertEquals( d.getValue(), 44 );
FT
17
A
18 @Test(expectedExceptions=IllegalArgumentException.class)
19 public void test2() {
20 d.setValue(0);
21 d.add(-44);
DR
22 assertEquals( d.getValue(), 44 );
23 }
24
25 }
On line 18, the test parameter expectedExceptions is used to notify TestNG that an
exception should be raised, and to fail the test if it is not raised.
The result of running this test is shown in Figure 11.11.
PASSED: test1
PASSED: test2
===============================================
Command line suite
Total tests run: 2, Passes: 2, Failures: 0, Skips: 0
===============================================
The tests both pass. If test1() had raised an exception from assertEquals() failing,
then the test would have failed. If test2() had not raised an exception, then that test
would have failed.
For CS265/CS608 Students Personal Use Only
For the equivalence partition, boundary value analysis, and decision tables test
techniques, exceptions should be treated as another output (expected results) from the
code: a method can not both return a value and raise an exception.
1
2
3
4
// Test for class XXX
@Test
public void test() {
A
XXX x = new XXX();
FT
class to be tested is hard-coded into the test class.
On line 4 the test item is created inside the test class: the test can only be run against
an object of Class XXX.
If we now have a new class to test, that inherits from this class, we need to run these
existing tests against the new class (assuming that it is a true subclass that fully supports
all the superclass behaviour10 ). A cut-and-paste approach is shown in Snippet 11.3.
• The obvious problem, as with all cut-and-paste approaches, is that any changes
to the XXX test code do not automatically get propagated to the YYY test code.
And it is probably in a different Java file. This is a particular problem in modern
development methods, where classes are refactored and added to on a regular basis.
• There is a second, less obvious problem. If a third class ZZZ inherits from YYY,
then you need to copy the XXX and YYY tests into the ZZZ test class. This results
in an explosion of copied code, and is very likely to lead to mistakes and untested
code.
However, there are some strengths in this approach, which we need to consider when
designing alternate approaches:
1. It is very clear what tests are being run against which class in the class hierarchy
2. It is possible to select which tests are to be run for inheritance testing. The tester
may wish to not run some test cases either for performance reasons, or because the
subclass is not fully Liskov substitutable and the tests are not applicable.
FT
3. If the YYY constructor takes different parameters from the XXX constructor it is
easy to handle – the correct parameters can be just passed to the YYY constructor.
We will use two classes, Shape (Listing 11.12) and Circle (Listing 11.13), to
demonstrate two possible approaches to inheritance testing: passing the class name as a
parameter to the test, and inheriting the tests.
A
Listing 11.12: Source Code for Class Shape
3 class Shape {
4 String name="unknown";
5 String getName() { return name; }
6 void setName(String name) { this.name = name; }
DR
7 }
Shape
name: String
+setName(String)
+getName(): String
4
|
Circle
radius: int=0
+setRadius(int)
+getRadius(): int
FT
A test for class Shape is shown in Listing 11.14 – instead of hard-coding the class to be
tested, the class name is passed as a parameter, and a factory method11 is used to create
the object to test.
14 Class<?> c = Class.forName(cn);
15 Shape o = (Shape)(c.getDeclaredConstructor().newInstance());
16 System.out.println("Running Shape test against instance of
"+o.getClass());
17 return o;
18 }
19
20 @Test
21 public void test_demo() throws Exception {
22 Shape o = createShape();
23 o.setName("Test name 1");
24 assertEquals( o.getName(), "Test name 1" );
25 }
26
27 }
This test uses a factory method to create the test item, rather than calling the
constructor directly. This allows objects of different classes, as required, to be returned
by the factory method.
Important lines in the test are:
11 A factory method can be used instead of a constructor to create an object. The factory method
• Line 13 gets the class name which is passed as a Java property, rather than as a
command line parameter, as with TestNG the test class does not have a main()
method to pass parameters to
• Line 14 finds the class associated with the (full) classname
• Line 15 instantiates an object of that class, by calling the constructor indirectly
via the newInstance() method
• Line 16 prints out the test name and the classname – providing a record in the test
log of what class was tested (as this test can be run against different classes).
This mechanism allows the Shape tests to be run against an instance of class
example.Shape as in Figure 11.13.
It also allows the Shape tests to be run against an instance of class example.Circle as
The main disadvantage of this approach for inheritance testing is that it requires the
tester to explicitly run the tests for a class against every superclass in the class hierarchy.
The advantage is that the tester can easily select the test classes to execute12 .
FT
• On line 18, the factory method is called to create a Shape.
The results of running ShapeTest (on a Shape) are shown in Figure 11.15.
A CircleTest class can now be developed to test the Circle class as shown in
Listing 11.16. By extending ShapeTest, the class CircleTest will inherit the test shape()
method (and, critically, its annotation).
24 }
25
26 }
Class CircleTest has two test methods. It inherits the method test shape(), and it
defines the method test circle().
As a result of this, when CircleTest is run as a TestNG test, the following occurs:
• The test method testShape() shown in Figure 11.15, lines 17-22, is inherited from
the ShapeTest class. TestNG finds this inherited method, and runs it as a test.
• The method testShape() calls createInstance() – Figure 11.15, line 18 – but as
the test is running in CircleTest context, the method CircleTest.createInstance() is
called 13 . This returns an object instance of class Circle – lines 12-14.
• This causes shapeTest() to be run against a Circle.
In contrast, when the test shapeTest() was run in ShapeTest context, the method
The result of this is that both the shape tests and the circle tests are run against
a Circle. This technique works automatically in a deep inheritance hierarchy, as each
subclass test inherits all the superclass tests in the hierarchy.
The results of running CircleTest are shown in Figure 11.16.
A
Running Circle test against instance of class example.Circle
Running Shape test against instance of class example.Circle
PASSED: test_circle
PASSED: test_shape
===============================================
DR
FT
The results of running this version of CircleTest are shown in Figure 11.17.
===============================================
The shape tests run before the circle tests. The test method dependency notifies
TestNG to call test shape() before it calls test circle(). This works well with further tests
in the test hierarchy. If a new subclass test depends on test circle(), then test circle()
inherits the dependency on test shape(). As a result, all the inherited methods will be
run from the top of the hierarchy downwards, in the required order.
10
11 // Factory method to create a Shape
12 Shape createInstance() {
13 return new Shape();
14 }
15
16 @Test(groups={"inherited","shape"})
17 public void test_shape() throws Exception {
18 Shape o = createInstance();
19 System.out.println("Running Shape test against instance of
"+o.getClass());
20 o.setName("Test name 1");
21 assertEquals( o.getName(), "Test name 1" );
22 }
23
24 }
9
10
11
12
13
14
Circle createInstance() {
}
return new Circle();
A FT
Listing 11.19: Circle Test with Groups
public class CircleTest extends ShapeTest {
15
16 // Shape tests are run automatically
17 // New circle tests go here
18 @Test(groups={"inherited","circle"})
19 public void test_circle() throws Exception {
DR
20 Circle o = createInstance();
21 System.out.println("Running Circle test against instance of
"+o.getClass());
22 o.setRadius(44);
23 assertEquals( o.getRadius(), 44 );
24 }
25
26 }
===============================================
Total tests run: 1, Passes: 1, Failures: 0, Skips: 0
===============================================
Running the tests in group inherited results in the output shown in Figure 11.19.
11.10
A FT
Total tests run: 2, Passes: 2, Failures: 0, Skips: 0
===============================================
WebDriver
This is a library which can be used with different programming languages such as Java,
PHP, Python, Rugby, Perl and C#. Test code (e.g. Selenium-based tests) can use the
WebDriver API to open a browser such as Chrome or Firefox, and then launch a web
application (i.e. open a URL). The test code simulates user interactions with the web
application. Suitable assertions are inserted into the program to verify that the actual
behaviour or output matches the expected behaviour/output. In this book, the Chrome
Web driver is used (as of writing, it was most up to date). For ease of use, it can be
imported into a Eclipse/Java project.
For CS265/CS608 Students Personal Use Only
A minimal set of the Selenium WebDriver API, which are required to perform basic
tests using the Chrome WebDriver14 , include:
• Loading a page whose URL is specified in the String variable url:
// webdriver.chrome.driver must be set to full path
// of the executable file
System.setProperty( "webdriver.chrome.driver",
"./selenium/chromedriver");
driver = new ChromeDriver();
wait = new WebDriverWait(driver, 30);
driver.get(url)
• Verify the page title:
assertEquals("<expected name>", driver.getTitle() );
• Finding an element on a page by id (finding by name is not recommended):
FT
driver.findElement(By.id("<elementid>"))
• Simulate data being typed into an input field:
driver.findElement(By.id("<elementid>")).
sendKeys("<input data>");
• Get a value from an input field:
A
driver.findElement(By.id("<elementid>")).
getAttribute("value")
• Simulate a user clicking on an element in a page (such as a button, link, checkbox
DR
or menu item):
driver.findElement(By.id("<elementid>")).click()
• Verify if a checkbox is selected:
assertTrue(driver.findElement(By.id("<elementid>")).
isSelected())
• Wait for an element to appear – this is particularly important for web applications,
as there can be a significant delay (in the order of several seconds) before the page
updates. Reference the Selenium documentation for all the ExpectedConditions
supported – but note the use of visibility rather than presence to make sure the
web element is actually visible on the screen, rather than merely present in the
DOM. For example, to wait for a page to be displayed:
wait.until(ExpectedConditions.titleIs("<expected title"));
wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("<expected element>")));
Note that the system property set is specific to chromedriver: other drivers may
support different properties to assist Selenium to locate the driver.
14 See https://sites.google.com/a/chromium.org/chromedriver/downloads
For CS265/CS608 Students Personal Use Only
id as in HTML). FT
• Call Container.getComponents() recursively to find a component (window, button,
textbox, etc) – with Component.getName() to find by name (instead of using the
The key difference when compared to web applications is that the application has
A
full control of the application screen, whereas for a web page the user can return to a
stale window by typing in the URL or using the browser back button. The tester should
reference the API for the GUI library in use for full details 15 .
There are a number of frameworks for Java GUI application testing: these are built on
DR
top of the underlying AWT/Swing/JavaFX libraries. There are also many OS-dependent
and language-independent frameworks for GUI application testing frameworks. Both
proprietary and open-source tools are available. A review of these tools is beyond the
scope of this book.
15 The AWT, Swing, and JavaFX APIs are published by Oracle on https://docs.oracle.com
16 See http://selendroid.io
17 See https://ios-driver.github.io/ios-driver
For CS265/CS608 Students Personal Use Only
Chapter 12
Random Testing
This is an active research area, and a discussion of all the up-to-date solutions to these
problems is beyond the scope of this book. We will, however, present an introduction to
DR
the topic: a simple form of automated random testing using random data in conjunction
with manually generated black-box or white-box test coverage items is presented in this
chapter.
The technique is demonstrated through two examples: (i) unit testing, and (ii)
application testing.
257
For CS265/CS608 Students Personal Use Only
Outputs
return value:
FULLPRICE if bonusPoints≤120 and not a goldCustomer
FT
FULLPRICE if bonusPoints≤80 and a goldCustomer
DISCOUNT if bonusPoints>120
DISCOUNT if bonusPoints>80 and a goldCustomer
ERROR if any inputs are invalid (bonusPoints<1)
goldCustomer
Return Value
A true
false
FULLPRICE
DISCOUNT
ERROR
FT (bonusPoints≤Long.MAX VALUE)
goldCustomer
!goldCustomer
Return Value==FULLPRICE
Return Value==DISCOUNT
Return Value==ERROR
Using these criteria, random test data can be specified for each test case, as shown
in Table 12.3. This is essentially the same as Table 2.9 in Chapter 2 as developed for
equivalence partition testing, but for each test case actual data values are replaced by
DR
4. T12.4 is an error test case, so it only covers a single input error test coverage item
(EP1).
@DataProvider(name="eprandom")
A FT
// Store the data values in case of test failure
56 long value;
57 if (min>0)
58 value = min + (long) (r.nextDouble() * (max-min));
59 else do
60 value = r.nextLong();
61 while ((value<min)||(value>max));
62 return value;
63 }
64
65 @AfterMethod
66 public void reportFailures() {
67 if (failed) {
68 System.out.println("Test failure data for test: "+testId);
69 System.out.println(" bonusPoints="+bonusPoints);
70 System.out.println(" goldCustomer="+goldCustomer);
71 System.out.println(" actual result="+actual);
72 System.out.println(" expected result="+expected);
73
74
75
76 }
}
}
FT
The assertion on line 50 is placed within a loop that generates random values for
bonusPoints on lines 45. The values for the other inputs are fixed by the equivalence
partition test criteria. The loop also limits the execution time on line 44, in order to
A
terminate the test.
If a test fails, the standard test report will only show the inputs to the test method,
which are the test data criteria rather than the actual data values. This can make
replicating and debugging problems difficult. To address this, and report the actual
DR
random data values that caused the failure, the test method saves the input parameter
values and the expected results and sets failed to true (lines 45-48). If the test succeeds,
then line 51 will be executed setting failed to false. If the test fails, then failed will remain
true and the @AfterMethod method will be called. This prints out the parameters values
for a failed test (lines 68-72).
All the tests have passed. With the selected value of RUNTIME=1000 milliseconds
in the code, the total test execution time is approximately 4 seconds for the four tests.
12.4.1 Analysis
S1A1 requires a value for litres that fits in a tank without extra safety. The range of
S1A2 requires a value for litres that fits in a tank with extra safety. The range of values
is from 1 to 800.
A
S1A3 requires a value for litres that does not fit in a tank without extra safety. The range
of values is from 1201 to some unspecified maximum value.
S1A4 requires a value for litres that does not fit in a tank with extra safety. The range
of values is from 801 to some unspecified maximum value.
DR
As for the random unit testing, instead of selecting individual values to represent
each equivalence partition (shown in Table 10.3), criteria are used for selecting random
values at runtime1 , as shown in Table 12.4.
the application.
For CS265/CS608 Students Personal Use Only
completed
To be
RUS2 S1A2
later
RUS3 S1A3
RUS4 S1A4
ID
T12.1
TCI
Covered
RUS1
Inputs
Deselect highsafety
Click on Enter
Click on Check
FT
Enter string 1 ≤ litres ≤ 1200 into litres
Expected
Results
25 // Selenium
26
27 WebDriver driver;
28 Wait<WebDriver> wait;
29
30 // URL for the application to test
31
32 String url=System.getProperty("url");
33
34 // Stored data for test failure reports
35
36 static final long RUNTIME=10000L; // run each random test for 10
seconds
37 Random r_litres=new Random(); // RNG for litres input
38 boolean failed;
39 String litres;
40
41
42
43
44
45
46
47
boolean highsafety;
int counter;
@BeforeClass FT
public void setupDriver() throws Exception {
System.out.println("Test started at: "+LocalDateTime.now());
if (url==null)
throw new Exception("Test URL not defined: use -Durl=<url>");
48 System.out.println("For URL: "+url);
A
49 System.out.println();
50 // Create web driver (this code uses chrome)
51 if (System.getProperty("webdriver")==null)
52 throw new Exception("Web driver not defined: use
DR
-Dwebdriver=<filename>");
53 if (!new File(System.getProperty("webdriver")).exists())
54 throw new Exception("Web driver missing: "+
System.getProperty("webdriver"));
55 System.setProperty("webdriver.chrome.driver",
System.getProperty("webdriver"));
56 driver = new ChromeDriver();
57 // Create wait
58 wait = new WebDriverWait( driver, 5 );
59 // Open web page
60 driver.get( url );
61 }
62
63 @AfterClass
64 public void shutdown() {
65 driver.quit();
66 }
Listing 12.3 shows the additional attributes to support random testing (this is an
extract from the previous listing, so the line numbers overlap). These allow the data
values to be reported for a failed test.
Listing 12.3: Stored Test Data Values
For CS265/CS608 Students Personal Use Only
Listing 12.4 shows an extended @AfterMethod code. As in Chapter 11, this ensures
the application is displaying the main window even after a failed test. In addition, to
support random testing, it writes the data values selected for any random test that fails,
allowing the failure to be subsequently reproduced for debugging.
counter=0;
// Process test failures
if (failed) {
FT
System.out.println("Random test loops executed: "+counter);
Listing 12.5 shows the data provider and the random test method.
117
118
119
120
By.id("litres")));
FT
wait.until(ExpectedConditions.visibilityOfElementLocated(
driver.findElement(By.id("litres")).sendKeys(litres);
wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("highsafety")));
if (driver.findElement(
By.id("highsafety")).isSelected()!=highsafety)
driver.findElement( By.id("highsafety")).click();
A
121 wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("Enter")));
122 driver.findElement( By.id("Enter")).click();
123 wait.until(ExpectedConditions.titleIs("Results"));
124 wait.until(ExpectedConditions.visibilityOfElementLocated(
DR
By.id("result")));
125 assertEquals( driver.findElement(
By.id("result")).getAttribute("value"),result );
126 wait.until(ExpectedConditions.visibilityOfElementLocated(
By.id("Continue")));
127 driver.findElement( By.id("Continue")).click();
128 wait.until(ExpectedConditions.titleIs("Fuel Checker"));
129 }
130 failed = false; // if reach here, no test has failed
131 }
• The parameterised test includes extra input parameters lmin and lmax (line 101)
to support the lower and upper bounds for litres for each test.
• Instead of running a single test, a number of loops with different random values
are executed, using a timer (line 105) to decide when to complete the random test.
• A random value is selected for litres on lines 108-113. For application testing,
values based on actual user inputs are often used, so that testing better matches
real-world use. When these statistics are not available2 , normally distributed values
are often used as an approximation. The method Random.nextGaussian() returns
a normally distributed value, centered on 0.0, with a standard deviation of 3.0. For
a normal distribution, 99% of the values lie within 3 standard deviations, so the
equation on line 109 produces a random number centered in the middle of the range
lmin..lmax, and normally distributed. Note that 1% of the values will lie outside
this range, and these are moved into the range on lines 111-112.
• The parameter values are recorded on lines 104 and 113 for reporting in case of a
test failure.
• The remainder of the test is similar to the application test developed in Chapter 10.
The results of running these tests against the application FuelChecker is shown in
Figure 12.2. All the tests have passed. With the selected value of RUNTIME=10000
A
milliseconds, the total test execution time is approximately 40 seconds for the four tests.
process would be random and automated, and provide full coverage of the software
specification and implementation. This would lead to very comprehensive and low cost
testing. However, there are a number of barriers to this.
FT
• Ignore the issue by only requiring that the test completes without crashing the
software being tested – this is referred to as stability testing. The test oracle then
becomes a generic software utility to verify that the software keeps running after
any test input. In principle, this can be applied to all types of testing, but it is
usually applied to system testing. For unit or object-oriented testing, this would
imply checking that no unexpected exceptions are raised. For a system, it would
A
imply testing that the system keeps running – which may not always be easy to
determine.
• Write a program in a higher level, more abstract language to specify the software
requirements, and use that to generate the expected results. This can also be
DR
3 See http://www.openjml.org
For CS265/CS608 Students Personal Use Only
and branch coverage techniques can be used4 . Random values can then be selected
to match the criteria at runtime.
For application testing, data generation is often based on statistics of typical customer
input data. This allows the tests to mimic actual software usage. As well as helping to
find faults, this type of data generation allows the mean time to failure (MTTF) to also
be estimated in advance of software release. Whereas in principle this can be used for
unit testing it seldom is: the focus on unit testing tends to be more towards complete
coverage, rather than only coverage of typical inputs.
A
The Test Completion Problem
Ideally, automated random tests would cover all of the specification and all of the
DR
4 Alternatively, a decision tree can be built, selecting the output first, and then working backwards
and Sapienz, which is described in Sapienz: Multi-objective Automated Testing for Android Applications
(Mao).
For CS265/CS608 Students Personal Use Only
12.6 Evaluation
The results of running automated random tests against Fault 6, and a new Fault 10, in
OnlineSales.giveDiscount() are used to demonstrate the limitations of the technique.
12.6.1 Limitations
Some of the limitations of random testing with be explored by look at its effectiveness
against Fault 6 and Fault 10.
FT
Random Equivalence Partition Testing with Fault 6
Fault 6 was introduced in Chapter 6, Listing 6.3 by a complete redesign of the method,
introducing faults that were unlikely to be found by equivalence partition, boundary
value analysis, decision tables, statement coverage, or branch coverage testing.
The results of running the random equivalence partition tests against the code with
Fault 6 are shown in Figure 12.3.
A
Test failure data for test: T12.1
bonusPoints=20
goldCustomer=true
DR
actual result=DISCOUNT
expected result=FULLPRICE
PASSED: randomTest("T12.2", 81, 120, false, FULLPRICE)
PASSED: randomTest("T12.3", 121, 9223372036854775807, false, DISCOUNT)
PASSED: randomTest("T12.4", -9223372036854775808, 0, false, ERROR)
FAILED: randomTest("T12.1", 1, 80, true, FULLPRICE)
java.lang.AssertionError: expected [FULLPRICE] but found [DISCOUNT]
===============================================
Command line suite
Total tests run: 4, Passes: 3, Failures: 1, Skips: 0
===============================================
The fault is detected, and the input data values that caused the test failure are
displayed.
return rv;
A FT
if (bonusPoints==965423829) // fault 10
The results of running the equivalence partition tests against the code with Fault 10
are shown in Figure 12.4.
The fault is not detected. There is in fact a very low probability that the fault will
in fact be detected: approximately 1 in 263 .
of being found with random testing. Using decision tables, statement coverage, or branch
coverage7 test data increases the chance of each of these find faults, by using a wider
selection of input values.
Strengths
Limited, or no, manual intervention is required to run the tests. This holds the
potential for a significant improvement in software quality.
A large number of tests can be run in a limited time.
Stability testing has proven very successful in finding situations where software
hangs or crashes, and has been an important element in improving this aspect
of software quality.
Weaknesses
Most existing tools only perform stability testing, and therefore do not verify the
correct operation of the software, which must be tested separately.
FT
The test oracle, data selection, and test completion problems are still largely
unsolved.
Effective techniques require full software specifications. These are time consuming
A
to develop, and many developers are not experienced in producing these, using
formal languages (e.g. OCL).
• Random data selection can be used with any of the black-box and white-box test
techniques, by selecting test data criteria rather than test data values, and then
selecting test data values at runtime that match the criteria. In this case the test
oracle is manual.
• Fully random data selection can be used with a simple test oracle (the software
does not crash) for stability testing.
• There are a number of approaches for determining test completion: simple and
pragmatic approaches consist of specifying a fixed number of loops, or a fixed
amount of time for test execution. More advanced techniques include measuring
code coverage at runtime.
7 Using randomisation with boundary value analysis is unlikely to be effective, as the boundary values
application testing can be used to expand the breadth of data values used. By using input
data which is statistically representative of real-world data, collected from execution of
the application by real customers, an estimate of the mean time between failures (MTBF)
can be derived. Safety tests can often be developed by adding invariants to represent the
safety criteria for an application or a class, writing code to check for these (periodically
or after every method call), and simulating random inputs. Instead of checking that
the output of each method is correct, this form of testing verifies that the effect of the
method is correct with respect to the safety criteria.
A FT
DR
For CS265/CS608 Students Personal Use Only
Chapter 13
The activity of testing can be approached in two ways. The first is to wait until all the
FT
code has been written and then to test the finished product all at once. This is referred
to as “Big Bang” development. On the surface, it is an attractive option to developers
because testing activities do not hold back the progress towards completing the product.
However, it is a risky strategy as the likelihood that the product will work, or even be
close to working, can be very low, and is particularly dependent on program complexity
and program size. Additionally, if tests do reveal faults in the program, it is much more
difficult to identify their source.
A more modern approach is to test the software while it is being developed. This
A
is referred to as “incremental” development. Individual modules, or software features,
are tested as they are written. This process continues as additional software increments
are produced until the product is completed. While this may delay the release of the
final product, it should produce one with higher quality, and allows interim versions to
DR
275
For CS265/CS608 Students Personal Use Only
This chapter describes the activities required to plan software testing, and examines
how software testing fits into different models of the software development process.
FT
prepared and executed. It defines three levels of test documentation:
1. Organisational Test Documentation:
These documents define the organisational test policy and test strategy.
2. Test Management Documentation:
These documents defines the pre-test and post-test management documents
consisting of Test Plans and Test Completion Reports.
A
3. Dynamic Test Documentation:
Prior to the test being run, this documentation defines the Test Environment
and the Test Data. After the test has run, this defines the Test Execution
DR
A FT
DR
This is categorised into three groups of activity: Pre-Test Activities, Testing, and
Post-Test Activities:
• Pre-Test Activities consist of Analysis of the user requirements, followed by the
Design of the system, and Coding.
• Testing consists of Unit Testing, Integration Testing, System Testing and Accep-
tance Testing of the system.
• Post-Test Activities consist of Release of the software/product, and the subsequent
Maintenance after the software has been deployed.
All the planning is done at the beginning, and once created it is not to be changed.
There is no overlap between any of the subsequent phases. Often anyone’s first chance
to “see” the program is at the very end once the testing is complete and the software is
released.
For example, the requirements may need to change during the project or a better
approach to the design may become obvious during the coding phase. Additionally, it
can be time-consuming to produce all the associated documentation. From a testing
3 This is a simplification of the standard waterfall diagram, as it focuses on the testing activity.
For CS265/CS608 Students Personal Use Only
viewpoint, all the tests are carried out once the software is completed. This can cause
a number of problems. Firstly, budgetary or time pressure on the project at this
stage could result in insufficient or incomplete testing being carried out. This could be
further exacerbated by much testing being done on the program as a whole rather than
systematically progressing from unit testing to application or system testing. Secondly,
if testing exposes design faults in the program, it is too late to do a redesign in this
process, and the only option is to try and fix the problems in the code, which is followed
by more testing. If some faults are difficult to trace, it could result in many iterations
back and forth between fixing and testing. Lastly, customers may request changes once
they have received the product, which may lead to a long maintenance phase.
FT
the V-Model is to increase the focus on testing. Each activity leads to two outputs:
a specification of the next activity, and the criteria for testing the activity has been
correctly executed later in the process. It is called the V-Model for two reasons: the
focus on the verification and validation of the software product, and the shape of the
process.
Figure 13.3 shows a viewpoint of the V-Model4 that groups the pre-coding and
post-coding activities, to emphasise these relationships.
A
DR
• Analysis produces the Software Requirements Specification, which is both the input
for the High-level Design of the software, and the basis for System Testing.
• High-level Design produces the Software Design Specification, which is both the
input for the Detailed Design of the software, and the basis for Integration Testing.
• The Detailed Design activity produces the Detailed Design Specification. This is
used to write the code, and also is the basis for Unit Testing.
The advantages of the V-model are that it is simple and easy to manage due to the
rigidity of the model, and that it encourages verification and validation at all phases:
each phase has specific deliverables and a review process. Unlike the waterfall model, it
gives equal weight to testing rather than treating it as an afterthought.
FT
requirements at the beginning of the project. The Waterfall Model and the V-Model of
development do not provide an adequate framework for this situation, and an approach
based on incremental or Agile models can provide much better results. Agile software
development is associated with the Agile Manifesto5 .
The Agile Manifesto outlines four key components:
1.
2.
Individuals and interactions over processes and tools.
Working software over comprehensive documentation.
A
3. Customer collaboration over contract negotiation.
4. Responding to change over following a plan.
Like other incremental development methods, Agile methods emphasise building
DR
releasable software quickly in short time frames. Unlike other incremental methods, Agile
development measures these times frame in days or weeks rather than months. Agile
development encourages a high degree of collaboration between the software engineers.
To provide control over an Agile project, there are a number of guidelines:
FT
In this diagram, each increment is considered from the viewpoint of three phases: Pre-
Testing, Testing, and Post-Testing. In each increment, the testing consists of regression
testing (to ensure the new increment has not broken any previously working software),
and the testing of the new features added. The progressive release of tested software
increments means that interim versions of the software become available much earlier
in the development process. The quality of the final product is expected to be higher,
due to increased testing and the opportunity for a customer to view early releases of the
A
product.
A major advantage of the incremental model is that the product is written and tested
in smaller pieces. This reduces risks associated with the process and also allows for
changes to be included easily along the way. Additionally, by adopting an incremental
DR
model the customer or the users of the product have to be involved in the development
from the beginning. This means the system is more likely to meet their needs and the
users are more committed to the system because they have watched it grow. Thus,
another advantage of the incremental approach is an accelerated delivery of customer
services: important new functionality has to be included with every iteration so that
the customer can monitor and evaluate the progress of the product. Two primary
disadvantages are that it can be difficult to manage because of the lack of documentation,
in comparison to other models, and continual changes to the software can make it difficult
to maintain as it grows in size.
the customer select and write functional tests, and on running these tests on a regular
basis.
The XP process values encourage developers to:
A FT
DR
User Stories are written by the customers, providing the user requirements specifica-
tions. They are in the format of about three sentences of text written by the customer
using non-technical language. The User Stories are the basis for software Requirements
which are used to plan releases of the software. They are also used to produce Test
Scenarios which drive the Acceptance Testing for each release. A release of the software
consists of multiple iterations, each of which lasts typically from 1-3 weeks. Each iteration
can be considered as containing Pre-Testing, Testing, and Post-Testing activities.
Testing is a key part of XP. Unit tests are implemented before the code is written.
Doing this helps the developer to think deeply about what they are doing. The
requirements are defined fully by the tests. Another benefit is that the code, influenced
by the existing unit tests, is expected to be clearer to understand and easier to test.
When a fault is fixed, new tests are created to ensure it has been fixed correctly.
see http://www.extremeprogramming.org/map/project.html
For CS265/CS608 Students Personal Use Only
13.5.3 Scrum
Scrum is a process for managing complex software projects and is a technique of Agile
software development. It is similar to XP but there are a number of differences:
• Scrum teams work in iterations that are called Sprints. These can last a little
longer than XP iterations.
• Scrum teams do not allow changes to be introduced during the Sprints. XP teams
are more flexible with changes within an iteration as long as work has not started
on that particular feature already.
• In Scrum there is more flexibility for additional stakeholders to influence the
ordering of implementing features. XP implements features in a priority order
decided essentially by the customer.
• In Scrum it is up to the team to organize themselves and adopt the practices they
feel work best for themselves. In XP, unit testing and simple design practices are
built in.
FT
Figure 13.6 shows the Scrum process with a focus on the testing activities7 .
A
Figure 13.6: Testing in the Scrum Process
DR
The diagram starts with the Product Backlog which is a prioritized list of all the
required product Features. Scrum teams take on as much of the Product Backlog as
they think they can turn into an increment of product functionality within a 30-day
iteration or Sprint, forming the Sprint Backlog. A Sprint consists of the Pre-Testing,
Testing, and Post-Testing activities for each Selected Feature in the Sprint Backlog. The
testing approach depends on the team, and is not strictly prescribed as in XP, but will
usually consist of unit testing, regression testing, etc. There are daily Scrum meetings as
part of each Sprint, not highlighted in this diagram as they are not strictly test-related.
The Selected Features are also used to create acceptance tests which are used to verify
the Tested Software Product Increment for each Sprint.
This approach has proven successful, as the testing team collaborates closely with the
developers from the start of the project.
7 The standard Scrum process diagram focuses on the Product Backlog and the Scrum Sprint cycles
– see https://www.scrum.org/resources/scrum-framework-poster
For CS265/CS608 Students Personal Use Only
(QA) process. A key model is the ISO 9000/25000 series of standards. The ISO
9000/25000 series of standards are a significant concern for companies developing software
and systems for public tender – they provide state bodies with an assurance that the
software is being developed in a professional manner.
A FT
DR
For CS265/CS608 Students Personal Use Only
Chapter 14
Wrapup
This chapter summarises the test techniques presented in the book, takes a reverse
14.1 Summary
FT
look at the testing process to explain the dependencies of the test activities, and finishes
with some recommended further reading and a look at research directions.
This book has introduced the reader to the following essential testing material:
• An introduction to testing (Chapter 1), covering:
A
– The importance of software quality.
– The theory of testing that implies exhaustive testing.
– Exhaustive testing, and why it is not feasible.
DR
284
For CS265/CS608 Students Personal Use Only
FT
and comparing the output (the actual results) to the expected results.
– Application testing involves emulating a user interacting with an application,
making sure that every user story meets its acceptance criteria.
• An introduction to techniques for testing object-oriented software (Chapter 9).
Three essential techniques are discussed:
– Testing in class context: how to test methods and attributes that interact
A
with each other in a class.
– Inheritance testing: testing that inherited behaviour functions correctly.
– State-Based testing: testing the state-based behaviour of a class.
• An introduction to random testing (Chapter 12). Random testing poses three key
DR
challenges.
– The test data problem: how to generate valuable random data.
– The test oracle problem: how to calculate the expected output for random
inputs.
– The test completion problem: how to know when to end random testing.
• An overview of key test automation topics, with representative examples (Chap-
ter 11).
– Unit testing, using TestNG.
– Test coverage measurement, using JaCoCo.
– Automated application testing, using Selenium.
• A look at testing in the software process (Chapter 13). This chapter clarifies the
positioning of testing in the larger context of software development.
– A look at where testing fits into the standard software development models.
– A look at some specific development processes and the role of testing in each.
by considering software testing from the opposite perspective: with a reverse look. By
considering what is required at each step, the reader can develop a deeper understanding
of the need for this ordering of the activities. This argument will also provide some
motivation for documenting the outputs of each step, as a component in the development
of high-quality software.
FT
There are three data values in the code: 100, 23, and true. And there is also the name
of the test method: test1(). Where do these come from? Let us examine the previous
activity.
The test case ID (test1) has been assigned by the tester, using an appropriate naming
scheme. But where do the Input values in the test case (23 and true) come from? Where
does the Expected Results value (100) come from? And where do the TCI (Test Coverage
Item) Covered identifiers (a, b, and c) come from? Again, we examine the previous
activity.
The test coverage item identifiers have been assigned by the tester, again using a
suitable naming scheme. The values in the test cases of 23, true, and 100 have been
selected by the tester from the equivalence partitions shown. But where do the three
equivalence partitions shown come from? Again we refer to the previous activity.
FT
shown in Table 14.3, and the partitions for the return value (or output) shown in
Table 14.4.
The test coverage items shown previously use the partitions [0..100] for p1, [true] for
p2, and [100] for the expected results (return value).
A
Table 14.3: Input Equivalence Partitions for TestItem.categorise()
p2 true
false
Note: Table 14.4 shows the return values (0, 100, 101). Each of these represent a
single-valued partition for the return value.
Where do these parameters, and their associated partitions come from? They come
from the software specification.
For CS265/CS608 Students Personal Use Only
Outputs
return value:
14.2.6
0 if p1 is negative
Discussion
FT
100 is p1 is in the range 0..100 and p2 is true
101 otherwise
If we examine the sequence of activities and output produced, we can see that, for black-
A
box testing, the tester requires, in this order:
1. The specification to analyse. In the case of white-box testing, the analysis would
require the source code as well as the specification.
2. The results of analysing the specification (identifying the equivalence partitions).
DR
Except for software with high quality requirements, it is unusual to document the
outputs of the test design process to the level of detail shown in this book. Normally,
much of the work is done in the testers mind. It is, however, almost impossible to review
a test to ensure it is correct without some documentation of the analysis, test coverage
items, test cases, and test data. Whether the outputs are written down or not, the tester
must perform the same work. While learning how to test, it is excellent practice to write
down the output of every activity in the process.
which cover software testing at a higher level, or which are focused on specific tools or
environments, are not included.
There are many books and papers on software testing. In this section we provide
an annotated list of recommended texts, ones which we have found useful in preparing
this book. The list starts with more general works that discuss the software process
and how testing fits into this, and ends with more specific works that address particular
techniques or aspects of testing. We only provide a small number of selected, key texts
in each area; this is not a comprehensive survey of software testing literature.
Full details to access these readings are provided in the Bibliography, Appendix 14.4.2.
Not every technique in the standard is covered in this book, and not every technique
in this book is included in the standard. Three particular topics not covered in the
current standard are: integration testing, the testing of object-oriented software, and all
paths testing. The standard defines a very detailed and rigorous approach to software
testing – it is likely that only large organisations with a requirement for high quality
software will use the full standard. There are a number of other related IEEE standards
on particular aspects of testing.
The other key standard is the ISO 25000 family of standards. These provide
guidelines for software quality requirements and evaluation, and are essentially process
related.
FT
correctly with the existing code (integration testing). Integration testing is currently
addressed as an ad-hoc approach, and is best learned through experience in large software
systems.
Two representative books we have selected that address the topic are Integration
Testing from the Tranches (Frankel) and Continuous Integration (Duvall). Frankel has a
number of examples, and shows how to use a number of tools, and contains an interesting
discussion on integration with external software systems. From a testing viewpoint,
Duvall is more focused on how to automate various tests (unit, component, system,
A
functional). Both books contain useful coverage of Stubs/Mocks and the tools used to
test with these.
We suggest this topic to the reader as an important and fruitful area for future
research.
DR
Much of the more recent work is more easily available via research results, and some
relevant websites are listed here, with a focus on more recent results and commonly used
languages:
• The Viper toolset, and associated research, supports languages such as: Python,
RUST, Java, and OpenCL – we would recommend this as a starting point for the
beginner, especially as there are online tutorials and proof tools available:
– https://www.pm.inf.ethz.ch/research/viper.html
– https://vercors.ewi.utwente.nl
• The JML language and toolset support the Java language, though not currently
the latest version of Java
– http://www.openjml.org
• Spec# was a very promising project by Microsoft to support the C# language, but
unfortunately this is discontinued
– https://research.microsoft.com/en-us/projects/specsharp
FT
• Black-Box Testing (Bezier) and The Complete Guide to Software Testing (Hetzel)
are other classic books on software testing. Their approach is somewhat less
structured than the other classic books, and we recommend them as useful for
considering software testing from a more holistic viewpoint.
• Testing Computer Software (Kaner, Falk, and Nguyen) addresses many issues which
are supplementary to those covered in this book. It is divided into three sections:
fundamentals, specific testing skills, and managing testing projects and groups. We
A
recommend it for its breadth of coverage.
• The Software Testing Engineer’s Handbook (Bath and McKay) is a study guide for
the ISTQB Test Analyst and Technical Test Analyst Advanced Level Certificates.
We recommend this both in preparation for this certification, and as for the previous
DR
14.4.1 Conferences
The following list of conferences, while not complete, gives a starting point to the reader
for exploring the topic further, and gaining a knowledge of the state-of-the-art:
FT
achieve white-box coverage criteria, such as code coverage. Research is directed at
search-based techniques to achieve particular coverage criteria.
• Mutation Analysis. The idea of Mutation Testing has mainly been applied at
the source code level. Recently, the idea has been applied to test different artifacts,
including research into the notation used.
• Regression Test Reduction. Running Regression Tests is often the most time
consuming testing activity. When software is changed, or extended, checking that
A
the existing functionality still works correctly is at least as important as checking
that the new functionality works correctly. Research is directed at finding the
optimal set of tests to re-run.
• Model-Based Testing. Recent years have seen increasing interest in the use of
DR
models for testing software, for example UML. Research includes formal verification
of model transformations into code, adding debug support for model-based testing,
automatic test data generation based on the models.
• Automated Software Verification. Formal specifications provide the basis for
automating testing, either at runtime, or statically. These usually take the form
of assertions stating pre-conditions, post-conditions, and invariants that must hold
for the software to match its specification. Recent research has started to provide
working tools that do both static and dynamic evaluation of software against its
specification.
• New Technologies. New software technologies and architectures require the
application of test principles in new ways. Examples include research into effective
testing for SOA (Service Oriented Architecture), Virtualisation, Dynamic Software
Systems, GUI (Graphical User Interface), Cloud Computing, Artifical Intelligence,
etc.
• Software Process and Tools. Testing is a key part of the software process, and
tools are critical to its support. Research is active into the relationship between
testing and the other process activities, the development of test-driven development
processes, and the effectiveness of testing as a quality measure. Good example
of new directions include automated tools for testing Andorid applications and
Autonomous Vehicles.
For CS265/CS608 Students Personal Use Only
All the examples shown in the book are available online at TBD. They can be run
on Windows or Linux, and have been verified on Windows 10 and Ubuntu 18.04. Some
small changes may be required to run the examples under MacOS.
Before running any of the examples, you must have the Java JDK installed. All the
examples have been verified using the latest LTS (Long Term Support) version of Java
FT
at the time of publication: JDK 11. You must also have TestNG installed. Chapters 5 &
6 require the JaCoCo coverage libraries, and Chapters 10, 11, and 12 require Selenium.
Details of the versions required, and where they were downloaded from, are provided in
the file readme.txt in the top level folder. All the dependencies should be placed in the
’libraries’ folder before running the examples.
For simplicity, the examples have been developed to run from the command line. For
this to work, you must have the JDK bin folder on your PATH (see the JDK installation
A
instructions for details). You can verify this by typing the commands “javac –version”
and “java -version”. Both command should complete successfully, and should show the
same java version. You can also run the examples from an IDE (such as Eclipse)–but no
instructions for this are included.
DR
The top level folder contains utility scripts (windows batch files) to build and run the
examples.
• readme.txt – provides a complete list explaining each file for that chapter.
• check-dependencies.bat – this checks that the Java compiler is on PATH, and
that all the .jar files needed are in the libraries folder.
• compile-all.bat – compiles all the java files required for the book.
• run-menu.bat – this allows you to run all of the examples. Select the chapter,
and then the figure number.
Each chapter has its own folder, and in each folder there are scripts to compile and
run the associated examples. See how these are called from run-menu.bat.
Every Windows batch file (xxx.bat) has a corresponding bash script for use on Linux
(xxx.sh). With slight modifications, the linux scripts can be used on MacOS also.
294
Bibliography
FT
• A Practical Tutorial on Modified Condition/Decision Coverage, Hayhurst, Veer-
husen, Chilenski and Rierson, TM-2001-210876, NASA, 2001.
• A Survey on Adaptive Random Testing, Huang, Sun, Xu, Chen, Towey, and Xia,
IEEE Transactions on Software Engineering, 29th Sept, IEEE 2019
• Black-Box Testing: Techniques for Functional Testing of Software and Systems,
Bezier, John Wiley & Sons, 1995.
• DART: Directed Automated Random Testing, Godefroid, Klarlund and Sen, in
A
Proceedings of the ACM SIGPLAN Conference on Programming Language Design
and Implementation, ACM, 2005.
• Experience-driven Software Process Improvement, Vinter and Poulsen, in Proceed-
ings of the Conference on Software Process Improvement (SPI 96), International
DR
295
BIBLIOGRAPHY 296
•
•
•
•
Wesley, 2010.
FT
The Art of Software Testing, Myers, John Wiley and Sons, 2004.
The Complete Guide to Software Testing, Hetzel, QED Information Sciences, 1988.
The Economic Impacts of Inadequate Infrastructure for Software Testing, Final
Report, NIST, May 2002
The Java® Language Specification, Java SE 11 Edition, Gosling, Joy, Steele,
Bracha, Buckley, and Smith, Oracle, 2018. [Available at: docs.oracle.com]
A
• The Specification of Complex Systems, Cohen, Harwood and Jackson, Addison-
Wesley, 1986.
• The Software Testing Engineer’s Handbook: A Study Guide for the ISTQB Test
Analyst and Technical Test Analyst Advanced Level Certificates 2012, Bath and
DR
acceptance testing, 17
actual results, 20
agile development, 279
FTend-to-end path, 113, 122
equivalence partition, 25
error guessing, 14
error hiding, 19
errors of commission, 144
errors of omission, 144
exhaustive testing, 11
all paths coverage, 113 expected results, 20
A
all paths testing, 122 extreme programming, 280
analysis, 18
application testing, 207, 262 fault insertion, 14
assertEquals, 36 fault model, 22
DR
297
INDEX 298
scrum, 282
selendroid, 256
selenium, 201, 230, 254
A FT
short circuit evaluation, 108
software testing standards, 289
specification-based range, 28
stability testing, 270
DR
test case, 17
test completion problem, 269
test coverage item, 17, 18
test data, 20
test groups, 252
test oracle, 11, 268
test runner, 35, 233
test timeouts, 243
testing exceptions, 244
testing in class context, 163
testng, 35, 230, 231