ST Material
ST Material
ST Material
Contents:
1
Chapter3: Program Inspections, Walkthroughs, and Reviews
2
Chapter 6: SOFTWARE TESTING STRATEGIES
7.1. CMM.....................................................................................................................92
7.2. Six Sigma100
3
Chapter 1: Introduction to Software Testing
1.2. Introduction
Software testing is a critical element of software quality assurance and represents the
ultimate process to ensure the correctness of the product. The quality product always
enhances the customer confidence is using the product thereby increases the business
economics. In other words, a good quality product means zero defects, which is derived
from a better quality process in testing.
The definition of testing is not well understood. People use a totally incorrect definition
of the word testing, and that this is the primary cause for poor program testing. Examples
of these definitions are such statements as Testing is the process of demonstrating that
errors are not present, The purpose of testing is to show that a program performs its
intended functions correctly, and Testing is the process of establishing confidence that
a program does what is supposed to do.
Testing the product means adding value to it that means raising the quality or reliability
of the program. Raising the reliability of the product means finding and removing errors.
Hence one should not test a product to show that it works; rather, one should start with
the assumption that the program contains errors and then test the program to find as many
of the errors as possible. Thus a more appropriate definition is:
Testing is the process of executing a program with the intent of finding errors.
4
What is the purpose of Testing?
To show the software works: It is know as demonstration-oriented
To show the software doesnt work:
It is known as destruction-oriented
Unrealistic schedule
Some of major computer system failures listed below gives ample evidence that the
testing is an important activity of the software quality process
1. In April of 1999 a software bug caused the failure of a $1.2 billion military
satellite launch, the costliest unmanned accident in the history of Cape
Canaveral launches. The failure was the latest in a string of launch failures,
triggering a complete military and industry review of U.S. space launch
5
programs, including software integration and testing processes. Congressional
oversight hearings were requested.
2. On June 4 1996 the first flight of the European Space Agency's new Ariane 5
rocket failed shortly after launching, resulting in an estimated uninsured loss
of a half billion dollars. It was reportedly due to the lack of exception
handling of a floating-point error in a conversion from a 64-bit integer to a 16-
bit signed integer.
3. The computer system of a major online U.S. stock trading service failed
during trading hours several times over a period of days in February of 1999
according to nationwide news reports. The problem was reportedly due to
bugs in a software upgrade intended to speed online trade confirmations.
5. Software bugs caused the bank accounts of 823 customers of a major U.S.
bank to be credited with $924,844,208.32 each in May of 1996, according to
newspaper reports. The American Bankers Association claimed it was the
largest such error in banking history. A bank spokesman said the programming
errors were corrected and all funds were recovered.
All the above incidents only reiterate the significance of thorough testing of software
applications and products before they are put on production. It clearly demonstrates
that cost of rectifying defect during development is much less than rectifying a defect
in production.
6
1.3. What is Testing?
Destruction-oriented:
The purpose of testing is to show the software doesnt work.
Evaluation-oriented:
The purpose of testing is to reduce the perceived risk of not working up to an acceptable
value.
Prevention-oriented:
It can be viewed as testing is a mental discipline that results in low risk software
. It is always better to forecast the possible errors and rectify it earlier.
In general, program testing is more properly viewed as the destructive process of trying
to find the errors (whose presence is assumed) in a program. A successful test case is one
that furthers progress in this direction by causing the program to fail. However, one wants
to use program testing to establish some degree of confidence that a program does what it
is supposed to do and does not do what is not supposed to do, but this purpose is best
achieved by a diligent exploration for errors.
8
Post-release removal of defects is the most expensive
Significant portion of life cycle effort expended on testing
In a typical service oriented project, about 20-40% of project effort spent on testing. It is
much more in the case of human-rated software.
For example, at Microsoft, tester to developer ratio is 1:1 whereas at NASA shuttle
development center (SEI Level 5), the ratio is 7:1. This shows that how testing is an
integral part of Quality assurance.
9
in design or defects at any stages in the life cycle. The overall defect distribution is shown
in fig 1.1 .
Design
27%
Rqmts.
Design
Code Code
Other
Rqmts. 7%
56%
Other
10%
A good test is one that has a high probability of finding an as yet undiscovered
error.
The objective is to design tests that systematically uncover different classes of errors and
do so with a minimum amount of time and effort.
10
Those performance requirements appear to have been met.
Testing cannot show the absence of defects, it can only show that software defects are
present.
Fig 1.2: Test information flow in a typical software test life cycle
11
Software Configuration includes a Software Requirements Specification, a Design
Specification, and source code.
A test configuration includes a Test Plan and Procedures, test cases, and testing
tools.
It is difficult to predict the time to debug the code, hence it is difficult to schedule.
Some of the points to be noted during the test case design are:
Testing cannot prove correctness as not all execution paths can be tested.
12
A program with a structure as illustrated above (with less than 100 lines of Pascal code)
has about 100,000,000,000,000 possible paths. If attempted to test these at rate of 1000
tests per second, would take 3170 years to test all paths. This shows that exhaustive
testing of software is not possible.
Questions:
2. Explain the origin of the defect distribution in a typical software development life
cycle?
13
2. Chapter: SOFTWARE QUALITY ASSURANCE
2.2. Introduction
The quality is defined as a characteristic or attribute of something. As an attribute of an
item, quality refers to measurable characteristics-things we are able to compare to known
standards such as length, color, electrical properties, malleability, and so on. However,
software, largely an intellectual entity, is more challenging to characterize than physical
objects.
Quality design refers to the characteristic s that designers specify for an item. The grade
of materials, tolerance, and performance specifications all contribute to the quality of
design.
Quality of conformance is the degree to which the design specification s are followed
during manufacturing. Again, the greater the degree of conformance, the higher the level
of quality of conformance.
14
A procedure to assure compliance with software development standards
Measurement and reporting mechanisms
Formal
Software Technical
Measurement
Engineering Review
Methods
Quality
Standards Testing
And SCM
Procedures &
SQA
15
Nevertheless, measures of a programs characteristic do exist. These properties include
1. Cyclomatic complexity
2. Cohesion
3. Number of function points
4. Lines of code
When we examine an item based on its measurable characteristics, two kinds of quality
may be encountered:
Quality of design
Quality of conformance
16
2.6. Quality Control (QC)
QC is the series of inspections, reviews, and tests used throughout the development cycle
to ensure that each work product meets the requirements placed upon it. QC includes a
feedback loop to the process that created the work product. The combination of
measurement and feedback allows us to tune the process when the work products created
fail to meet their specification. These approach views QC as part of the manufacturing
process QC activities may be fully automated, manual or a combination of automated
tools and human interaction. An essential concept of QC is that all work products have
defined and measurable specification to which we may compare the outputs of each
process the feedback loop is essential to minimize the defect produced.
17
Prevention costs include
Quality Planning
Formal Technical Review
Test Equipment
Training
Appraisal costs includes activity to gain insight into product condition the "First time
through" each process.
Examples for appraisal costs includes
In process and inter process inspection
Equipment calibration and maintenance
Testing
Failure Costs are costs that would disappear if no defects appeared before shipping a
product to customers failure costs may be subdivided into internal and external failure
costs.
Internal failure costs are costs incurred when we detect an error in our product prior to
shipment.
Internal failure costs includes
Rework
Repair
Failure Mode Analyses
External failure costs are the cost associated with defects found after the product has been
shipped to the customer.
Examples of external failure costs are
1. Complaint Resolution
2. Product return and replacement
3. Helpline support
4. Warranty work
18
2.8. Software Quality Assurance SQA
QA is an essential activity for any business that produces products to be used by others.
The SQA group serves as the customer in-house representative. That is the people who
perform SQA must look at the software from customer's point of views.
The SQA group attempts to answer the questions asked below and hence ensure the
quality of software. The questions are
1. Has software development been conducted according to pre-established standards?
2. Have technical disciplines properly performed their role as part of the SQA activity?
19
SQA Activities
SQA Plan is interpreted as shown in figure 2
SQA is comprised of a variety of tasks associated with two different constituencies
1. The software engineers who do technical work like
Performing Quality assurance by applying technical methods
Conduct Formal Technical Reviews
Perform well-planed software testing.
2. SQA group that has responsibility for
Quality assurance planning oversight
Record keeping
Analysis and reporting.
QA activities performed by SE team and SQA are governed by the following plan.
Evaluation to be performed.
Audits and reviews to be performed.
Standards that is applicable to the project.
Procedures for error reporting and tracking
Documents to be produced by the SQA group
Amount of feedback provided to software project team.
SQAPlanning
Planning
SQA
Team
Software Team
Software Activities
Engineers Activities
Engineers
Activities
Activities
SQA Plan
20
Review software-engineering activities to verify compliance with defined software
process.
Audits designated software work products to verify compliance with those defined as
part of the software process.
Ensures that deviations in software work and work products are documented and
handled according to a documented procedure.
Records any noncompliance and reports to senior management.
21
Assume that an error uncovered during design will cost 1.0 monetary unit to correct.
Relative to this cost, the same error uncovered just before testing commences will cost
6.5 units; during testing 15 units; and after release, between 60 and 100 units.
DEVELOPMENT STEP
Errors from previous Step DEFECTS DETECTION
Errors passed through Percent efficiency for
Amplified errors 1:x Errors
error detection
Newly generated errors passed to
next step
Figure 2.3: Defect Amplification Model.
22
Only three latent defects exist. By recalling the relative cost associated with the
discovery and correction of errors, overall costs (with and without review for our
hypothetical example) can be established.
To conduct reviews a developer must expend time and effort and the development
organization must spend money. However, the results of the presiding example leave little
doubt that we have encountered a "Pay now or pay much more later" syndrome.
Formal technical reviews (for design and other technical activities) provide a
demonstrable cost benefit and they should be conducted.
Preliminary design
0
0
70
10 Detail Design
% 3,2
2
1-1.5
50
1 25
% 15 Code/Unit
5
10 -3 24
60%
10
25
To
Integration Test
24
12
0
70
10 Validation test
%
2
1-1.5 6
50
25
% System Test
0 3
60%
0
Latent errors
Integration Test
94
47
0
50
10 Validation test
%
2
1-1.5 24
50
25
% System Test
0 12
60%
0
Latent errors
24
To uncover errors in function, logic, are implementations for any representation of the
software.
To verify that software under review meets its requirements.
To ensure that software has been represented according to predefined standards
To achieve software that is developed in an uniform manner
To make projects more manageable
In addition, the FTR serves as a training ground, enabling junior engineers to observe
different approaches to software analysis, design, and implementation. The FTR also
serves to promote backup and continuity because numbers of people become familiar
with parts of the software that they may not have other wise seen.
The FTR is actually a class of reviews that include walkthrough inspection and round
robin reviews, and other small group technical assessments of software. Each FTR is
conducted as meeting and will be successful only if it is properly planned, controlled and
attended. In the paragraph that follows, guidelines similar to those for a walk through are
presented as a representative Formal technical review.
25
1. To identify problem areas within the product
2. To serve as an action item. Checklist that guides the producer as corrections are
made. An issues list is normally attached to the summary report.
It is important to establish a follow up procedure to ensure that item on the issues list
have been properly corrected. Unless this is done, it is possible that issued raised can "fall
between the cracks". One approach is to assign responsibility for follow up for the review
leader. A more formal approach as signs responsibility independent to SQA group.
26
elements of the process that introduce errors. To illustrate the process, assume that a
software development organization collects information on defects for a period of one
year. Some errors are uncovered as software is being developed. Other defects are
encountered after the software has been released to its end user.
Although hundreds of errors are uncovered all can be tracked to one of the following
causes.
Incomplete or erroneous specification (IES)
Misinterpretation of customer communication (MCC)
Intentional deviation from specification (IDS)
Violation of programming standards ( VPS )
Error in data representation (EDR)
Inconsistent module interface (IMI)
Error in design logic (EDL)
Incomplete or erroneous testing (IET)
Inaccurate or incomplete documentation (IID)
Error in programming language translation of design (PLT)
Ambiguous or inconsistent human-computer interface (HCI)
Miscellaneous (MIS)
To apply statistical SQA table 1 is built. Once the vital few causes are determined, the
software development organization can begin corrective action.
After analysis, design, coding, testing, and release, the following data are gathered.
Ei = The total number of errors uncovered during the ith step in the software
Engineering process
Si = The number of serious errors
Mi = The number of moderate errors
Ti = The number of minor errors
PS = Size of the product (LOC, design statements, pages of documentation at the
ith step
Ws, Wm, Wt = weighting factors for serious, moderate and trivial errors where
recommended values are Ws = 10, Wm = 3, Wt = 1.
The weighting factors for each phase should become larger as development progresses.
This rewards an organization that finds errors early.
27
At each step in the software engineering process, a phase index, PIi, is computed
PIi = Ws (Si/Ei)+Wm (Mi/Ei)+Wt (Ti/Ei)
The error index EI ids computed by calculating the cumulative effect or each PIi,
weighting errors encountered later in the software engineering process more heavily than
those encountered earlier.
EI = (i x PIi)/PS
= (PIi+2PI2 +3PI3 +iPIi)/PS
The error index can be used in conjunction with information collected in table to develop
an overall indication of improvement in software quality.
DATA COLLECTION FOR STATISTICAL SQA
Total Serious Moderate Minor
Error No. % No. % No % No %
IES 205 22 34 27 68 18 103 24
MCC 156 17 12 9 68 18 76 17
IDS 48 5 1 1 24 6 23 5
VPS 25 3 0 0 15 4 10 2
EDR 130 14 26 20 68 18 36 8
IMI 58 6 9 7 18 5 31 7
EDL 45 5 14 11 12 3 19 4
IET 95 10 12 9 35 9 48 11
IID 36 4 2 2 20 5 14 3
PLT 60 6 15 12 19 5 26 6
HCI 28 3 3 2 17 4 8 2
MIS 56 6 0 0 15 4 41 9
TOTA 942 100 128 100 379 100 435 100
LS
28
2.11.1. Measures of Reliability and Availability
In a computer-based system, a simple measure of reliability is mean time between failure
(MTBF), where
MTBF = MTTF+MTTR
The acronym MTTF and MTTR are Mean Time To Failure and Mean Time To Repair,
respectively.
Once hazards are identified and analyzed, safety related requirements could be specified
for the software i.e., the specification can contain a list of undesirable events and desired
system responses to these events. The roll of software in managing undesirable events is
then indicated.
Although software reliability and software safety are closely related to one another, it is
important to understand the subtle difference between them. Software reliability uses
29
statistical analysis to determine the likelihood that a software failure will occur however,
the occurrence of a failure does not necessarily result in a hazard or mishap. Software
safety examines the ways in which failure result in condition that can be lead to mishap.
That is, failures are not considered in a vacuum. But are evaluated in the context of an
entire computer based system.
The SQA plan provides a road map for instituting software quality assurance. Developed
by the SQA group and the project team, The plan serves as a template for SQA activities
that are instituted for each software project.
ANSI/IEEE Standards 730-1984 and 983-1986 SQA plans is defined as shown below.
I. Purpose of Plan
II. References
III Management
1. Organization
2. Tasks
3. Responsibilities
IV. Documentation
1. Purpose
2. Required software engineering documents
3. Other Documents
V. Standards, Practices and conventions
1. Purpose
2. Conventions
VI. Reviews and Audits
1. Purpose
2. Review requirements
a. Software requirements
b. Designed reviews
c. Software V & V reviews
d. Functional Audits
e. Physical Audit
f. In-process Audits
g. Management reviews
VII. Test
VIII. Problem reporting and corrective action
30
IX. Tools, techniques and methodologies
X. Code Control
XI. Media Control
XII. Supplier Control
XIII. Record Collection, Maintenance, and retention
XIV. Training
XV. Risk Management.
32
3.1. Introduction
The Majority of the programming community worked under the assumptions that
programs are written solely for machine execution and are not intended to be read by
people. The only way to test a program is by executing it on a machine. Weinberg built a
convincing strategy that why programs should be read by people, and indicated this could
be an effective error detection process.
Experience has shown that human testing techniques are quite effective in finding
errors, so much so that one or more of these should be employed in every programming
project. The method discussed in this Chapter are intended to be applied between the time
that the program is coded and the time that computer based testing begins. We discuss
this based on two ways:
It is generally recognized that the earlier that error are found, the lower are the
costs or correcting the errors and the higher is the probability of correcting the
errors correctly.
Programmers seem to experience a psychological change when computer-based
testing commences.
The process is performed by a group of people (three or four), only one of whom is the
author of the program. Hence the program is essentially being tested by people other than
the author, which is in consonance with the testing principle stating that an individual is
usually ineffective in testing his or her own program. Inspection and walkthroughs are far
more effective compare to desk checking (the process of a programmer reading his/her
own program before testing it) because people other than the programs author are
involved in the process. These processes also appear to result in lower debugging (error
33
correction) costs, since, when they find an error, the precise nature of the error is usually
located. Also, they expose a batch or errors, thus allowing the errors to be corrected later
enmasse. Computer based testing, on the other hand, normally exposes only a symptom
of the error and errors are usually detected and corrected one by one.
Some Observations:
Experience with these methods has found them to be effective in finding from
30% to 70% of the logic design and coding errors in typical programs. They are
not, however, effective in detecting high-level design errors, such as errors
made in the requirements analysis process.
Human processes find only the easy errors (those that would be trivial to find
with computer-based testing) and the difficult, obscure, or tricky errors can only
be found by computer-based testing.
Inspections/walkthroughs and computer-based testing are complementary; error-
detection efficiency will suffer if one or the other is not present.
34
Ensuring that the errors are subsequently corrected.
Hence the moderator may be called as quality-control engineer. The remaining members
usually consist of the programs designer and a test specialist.
The general procedure is that the moderator distributes the programs listing and design
specification to the other participants well in advance of the inspection session. The
participants are expected to familiarize themselves with the material prior to the session.
During inspection session, two main activities occur:
1. The programmer is requested to narrate, statement by statement, the logic of the
program. During the discourse, questions are raised and pursued to determine if
errors exist. Experience has shown that many of the errors discovered are actually
found by the programmer, rather than the other team members, during the
narration. In other words, the simple act of reading aloud ones program to an
audience seems to be a remarkably effective error-detection technique.
After session, the programmer is given a list of the errors found. The list of errors is also
analyzed, categorized, ad used to refine the error checklist to improve the effectiveness of
future inspections.
35
The inspection process is a way of identifying early the most error-prone sections
of the program, thus allowing one to focus more attention on these sections during
the computer based testing processes.
Data-Reference Errors
1. Is a variable referenced whose value is unset or uninitialized? This is probably the
most frequent programming error; it occurs in a wide variety of circumstances.
2. For all array references, is each subscript value within the defined bounds of the
corresponding dimension?
3. For all array references, does each subscript have an integer value? This is not
necessarily an error in all languages, but it is a dangerous practice.
4. For all references through pointer or reference variables, is the referenced storage
currently allocated? This is known as the dangling reference problem. It occurs
in situations where the lifetime of a pointer is greater than the lifetime of the
referenced storage.
5. Are there any explicit or implicit addressing problems if, on the machine being
used, the units of storage allocation are smaller than the units of storage
addressability?
6. If a data structure is referenced in multiple procedures or subroutines, is the
structure defined identically in each procedure?
7. When indexing into a string, are the limits of the string exceeded?
Data-Declaration Error
36
3. Where a variable is initialized in a declarative statement, is it properly initialized?
4. Is each variable assigned the correct length, type, and storage class?
5. Is the initialization of a variable consistent with its storage type?
Computation Errors
Comparison Errors
1. Are there any comparisons between variables having inconsistent data types (e.g.
comparing a character string to an address)?
2. Are there any mixed-mode comparisons or comparisons between variables of
different lengths? If so, ensure that the conversion rules are well understood.
3. Does each Boolean expression state what it is supposed to state? Programmers
often make mistakes when writing logical expressions involving and, or, and
not.
37
4. Are the operands of a Boolean operator Boolean? Have comparison and Boolean
operators been erroneously mixed together?
Control-Flow Errors
1. If the program contains a multi way branch (e.g. a computed GO TO in Fortran),
can the index variable ever exceed the number of branch possibilities? For
example, in the Fortran statement,
GOTO(200,300,400), I
Will I always have the value 1,2, or 3?
2. Will every loop eventually terminate? Devise an informal proof or argument
showing that each loop will terminate
3. Will the program, module, or subroutine eventually terminate?
4. Is it possible that, because of the conditions upon entry, a loop will never execute?
If so, does this represent an oversight? For instance, for loops headed by the
following statements:
DO WHILE(NOTFOUND)
DO I=X TO Z
What happens if NOTFOUND is initially false or if X is greater than Z?
Interface Errors
1. Does the number of parameters received by this module equal the number of
arguments sent by each of the calling modules? Also, is the order correct?
2. Do the attributes (e.g. type and size) of each parameter match the attributes of
each corresponding argument?
38
3. Does the number of arguments transmitted by this module to another module
equal the number of parameters expected by that module?
4. Do the attributes of each argument transmitted to another module match the
attributes of the corresponding parameter in that module?
5. If built-in functions are invoked, are the number, attributes, and order of the
arguments correct?
6. Does the subroutine alter a parameter that is intended to be only an input value?
Input/Output Errors
3.5. Walkthroughs
The code walkthrough, like the inspection, is a set of procedures and error-detection
techniques for group code reading. It shares much in common with the inspection
process, but the procedures are slightly different, and a different error-detection technique
is employed.
39
A programming-language expert,
A new programmer (to give a fresh, unbiased outlook)
The person who will eventually maintain the program,
Someone from different project and
Someone from the same programming team as the programmer.
The initial procedure is identical to that of the inspection process: the participants are
given the materials several days in advance to allow them to study the program.
However, the procedure in the meeting is different. Rather than simply reading the
program or using error checklists, the participants play computer. The person
designated as the tester comes to the meeting armed with a small set of paper test cases-
representative sets of inputs (and expected outputs) for the program or module. During
the meeting, each test case is mentally executed. That is, the test data are walked through
the logic of the program. The state of the program (i.e. the values of the variables) is
monitored on paper or a blackboard.
The test case must be simple and few in number, because people execute programs at a
rate that is very slow compared to machines. In most walkthroughs, more errors are found
during the process of questioning the programmer than are found directly by the test
cases themselves.
Questions
1. Is code reviews are relevant to the software testing? Explain the process
involved in a typical code review.
2. Explain the need for inspection and list the different types of code reviews.
3. Consider a program and perform a detailed reviews and list the review
findings in detail.
40
Dynamic testing of Software Applications
White box and black box testing
Various techniques used in White box testing
Various techniques used in black box testing
Static program analysis
Automation of testing process
4.2. Introduction
Software can be tested either by running the programs and verifying each step of its
execution against expected results or by statically examining the code or the document
against its stated requirement or objective. In general, software testing can be divided into
two categories, viz. Static and dynamic testing. Static testing is a non-execution-based
testing and carried through by mostly human effort. In static testing, we test, design, code
or any document through inspection, walkthroughs and reviews as discussed in Chapter
2. Many studies show that the single most cost-effective defect reduction process is the
classic structural test; the code inspection or walk-through. Code inspection is like proof
reading and developers will be benefited in identifying the typographical errors, logic
errors and deviations in styles and standards normally followed.
This testing technique takes into account the internal structure of the system or
component. The entire source code of the system must be available. This technique is
known as white box testing because the complete internal structure and working of the
code is available.
41
White box testing helps to derive test cases to ensure:
2. All logical decisions are exercised for both true and false paths.
3. All loops are executed at their boundaries and within operational bounds.
Logic errors and incorrect assumptions most likely to be made when coding for
"special cases". Need to ensure these execution paths are tested.
May find assumptions about execution paths incorrect, and so make design errors.
White box testing can find these errors.
Aim is to derive a logical complexity measure of a procedural design and use this as a
guide for defining a basic set of execution paths.
42
Test cases, which exercise basic set, will execute every statement at least once.
Flow graph notation helps to represent various control structures of any programming
language. Various notations for representing control flow are:
On a flow graph:
Any procedural design/ program can be translated into a flow graph. Later the flow graph
can be analyzed for various paths within it.
Note that compound Boolean expressions at tests generate at least two predicate node and
additional arcs.
Example:
43
Fig 3.2: Control flow of a program and the corresponding flow diagram
The cyclomatic complexity gives a quantitative measure of the logical complexity. This
value gives the number of independent paths in the basis set, and an upper bound for the
number of tests to ensure that each statement is executed at least once.
An independent path is any path through a program that introduces at least one new set of
processing statements or a new condition (i.e., a new edge)
44
Fig 3.3: Sample program and corresponding flow diagram
In Fig 3.3, the statements are numbered and the corresponding nodes also numbered with
the same number. The sample program contains one DO and three nested IF statements.
45
The above complexity provides the upper bound on the number of tests cases to be
generated or independent execution paths in the program. The independent paths(4 paths)
for the program shown in fig 3.3 is given below:
Independent Paths:
1. 1, 8
2. 1, 2, 3, 7b, 1, 8
3. 1, 2, 4, 5, 7a, 7b, 1, 8
4. 1, 2, 4, 6, 7a, 7b, 1, 8
Cyclomatic complexity provides upper bound for number of tests required to guarantee
the coverage of all program statements.
Test cases are designed in many ways. The steps involved for test case design are:
4. Prepare test cases that will force execution of each path in the basis set.
Note: some paths may only be able to be executed as part of another test.
Graph matrices can automate derivation of flow graph and determination of a set of basis
paths. Software tools to do this can use a graph matrix. A sample graph matrix is shown is
Fig 3.4.
46
The graph matrix:
Is a square matrix with number of sides equal to number of nodes.
Rows and columns of the matrix correspond to the number of nodes in the flow
graph.
Entries correspond to the edges.
The matrix can associate a number with each entry of the edge.
Use a value of 1 to calculate the cyclomatic complexity. The cyclomatic complexity is
calculated as follows:
For each row, sum column values and subtract 1.
Which is 4.
Some other interesting link weight can be measured by the graph as:
Probability that a link (edge) will be executed
Processing time for traversal of a link
47
Fig 3.4: Example of a graph matrix
In programs, conditions are very important and testing such conditions is more complex
than other statements like assignment and declarative statements. Basic path testing is
one example of control structure testing. There are many ways in which control structure
can be tested.
Condition testing aims to exercise all logical conditions in a program module. Logical
conditions may be complex or simple. Logical conditions may be nested with many
relational operations.
Can define:
Normally errors in expressions can be due to due to one or all or the following:
Boolean operator error
Mismatch of types
Condition testing methods focus on testing each condition in the program of any type of
conditions. There are many strategies to identify errors.
48
Some of the strategies proposed include:
Domain Testing: Uses three or four tests for every relational operator depending
on the complexity of the statement.
Branch and relational operator testing: Uses condition constraints. Based on the
complexity of the relational operators, many branches will be executed.
Example 1: C1 = B1 & B2
Condition constraint of form (D1, D2) where D1 and D2 can be true (t) or false(f).
The branch and relational operator test requires the constraint set {(t,t),(f,t),(t,f)}
to be covered by the execution of C1.
First, a proper data flow diagram like control flow(see basis path flow) is drawn. Then
selects test paths according to the location of definitions and use of variables. Any
variables that have been defined in any program behaves in the following way:
K: kill the variable, which is another state of the variable at any time of the execution of
the program.
Any variable that is part of the program will undergo any of the above state. However, the
sequence of states is important. We can avoid following anomalies during the program
execution:
DU: Normal,
49
DD: Suspicious
KD: Normal
KU: bug
UD: Normal
For example,
DU: Normal means a variable is defined first and then used in the program which is
normal behavior of the data flow in the program.
DK: Probable bug means a variable is defined and then killed before using in the
program. This may be bug as why the variable is defined and killed with out using in the
program.
Loops are fundamental to many algorithms. Loops can be categorized as, define loops as
simple, concatenated, nested, and unstructured. Loops can be defined in many ways.
Examples:
50
Fig 3.5: Different types of Loops
o (n-1), n, and (n+1) passes through the loop. This helps in testing the
boundary of the loops.
Nested Loops
o Start with inner loop. Set all other loops to minimum values.
51
o Continue until all loops are tested.
Concatenated Loops
Unstructured loops
52
4.6. Black Box Testing
Functional tests examine the observable behavior of software as evidenced by its outputs,
without any reference to internal functions. This kind of tests is from the user point of
view, which means as if the user is testing as in the normal business functions.
Black box tests normally determine the quality of the software. It is an advantage
to create the quality criteria from this point of view from the beginning.
In black box testing, software is subjected to a full range of inputs and the outputs
are verified for its correctness. Here, the structure of the program is immaterial.
Black box testing technique can be applied once unit and integration testing is
completed.
2. interface errors
4. performance errors
Some of the techniques used for black box testing are discussed below:
53
4.6.1. Equivalence Partitioning
The main objective of this method is to partitioning the input so that an optimal input data
is selected. Steps to be followed are:
1. Divide the input domain into classes of data for which test cases can be generated.
3. Identify the both right and wrong input data while partitioning the data.
An input condition is either a specific numeric value, range of values, a set of related
values, or a boolean condition.
If an input condition specifies a range or a specific value, one valid and two
invalid equivalence classes defined.
If an input condition specifies a boolean or a member of a set, one valid and one
invalid equivalence classes defined.
Test cases for each input domain data item developed and executed.
This method uses less number of input data compare to exhaustive testing. However, the
data for boundary values are not considered.
This method though reduces significantly the number of input data to be tested, it does
not test the combinations of the input data.
54
4.6.2. Boundary Value Analysis.
It is observed that boundary points for any inputs are not tested properly. This leads to
many errors. Large number of errors tend to occur at boundaries of the input domain.
Boundary Value Analysis(BVA) leads to selection of test cases that exercise boundary
values.
BVA complements equivalence partitioning i.e. select any element in an equivalence
class, select those at the ''edge' of the class.
Examples:
1. For a range of values bounded by a and b, test (a-1), a, (a+1), (b-1), b, (b+1).
2. If input conditions specify a number of values n, test with (n-1), n and (n+1) input
values.
3. Apply 1 and 2 to output conditions (e.g., generate table of minimum and
maximum size).
4. If internal program data structures have boundaries (e.g., buffer size, table limits),
use input data to exercise structures on boundaries.
BVA and Equivalence partitioning both helps in testing the programs and covers most of
the conditions. This method does not test the combinations of input conditions.
Executive Order 10358 provides in the case of an employee whose work week varies from
the normal Monday through Friday work week, that Labor Day and Thanksgiving Day
each were to be observed on the next succeeding workday when the holiday fell on a day
outside the employees regular basic work week. Now, when Labor Day, Thanksgiving
Day or any of the new Monday holidays are outside an employees basic workbook, the
immediately preceding workday will be his holiday when the non-workday on which the
holiday falls is the second non-workday or the non-workday designated as the employees
day off in lieu of Saturday. When the non-workday on which the holiday falls is the first
non-workday or the non-workday designated as the employees day off in lieu of Sunday,
the holiday observance is moved to the next succeeding workday.
55
How do you test code, which attempts to implement this?
1. Causes (input conditions) and effects (actions) are listed for a module and an
identifier is assigned to each.
Simplified symbology:
56
4.6.4. Comparison Testing
For redundant software, use separate teams to develop independent versions of the
software.
Test each version with same test data to ensure all provisional identical output.
Even if will only run one version in final system, for some critical applications can
develop independent versions and use comparison testing or back-to-back testing.
If the programming language semantics are formally defined, one can consider program
to be a set of mathematical statements. We can attempt to develop a mathematical proof
that the program is correct with respect to the specification. If the proof can be
established, the program is verified and testing to check verification is not required.
There are a number of approaches to proving program correctness. We will only consider
axiomatic approach.
57
Suppose that at points P(1), .. , P(n) assertions concerning the program variables and their
relationships can be made.
The assertion a(1) is about inputs to the program, and a(n) about outputs.
We can now attempt, for k between 1 and (n-1), to prove that the statements between
Given that a(1) and a(n) are true, this sequence of proofs shows partial program
correctness. If it can be shown that the program will terminate, the proof is complete.
Static analysis tools scan the source code to try to detect errors.
It can check:
1. Syntax.
2. Unreachable code
3. Unconditional branches into loops
4. Undeclared variables
5. Uninitialised variables.
6. Parameter type mismatches
7. Uncalled functions and procedures.
8. Variables used before initialization.
9. Non-usage of function results.
10. Possible array bound errors.
11. Misuse of pointers.
58
4.8. Automated Testing Tools
Automation of testing is the state of the art technique where in number of tools will help
in testing program automatically. Programmers can use any tool to test his/her program
and ensure the quality. There are number of tools are available in the market. Some of the
tools which helps the programmer are:
1. Static analyser
2. Code Auditors
3. Assertion processors
6. Test Verifiers
7. Output comparators.
Programmer can select any tool depending on the complexity of the program.
Question:
1. What is black box testing? Explain
2. What are the different techniques are available to conduct black
box testing?
3. Explain different methods available in white box testing with
examples.
59
5. Chapter: Testing for Specialized Environments
4.2. Introduction
The need for specialized testing approaches is becoming mandatory as computer software
has become more complex. The White-box and black box testing methods are applicable
across all environments, architectures and applications, but unique guidelines and
approaches to testing are sometime important. We address the testing guidelines for
specialized environments, architectures, and applications that are commonly encountered
by software engineers.
Because of modern GUIs standards (same look and feel), common tests can be derived.
60
What are the guidelines to be followed which helps for creating a series of generic
tests for GUIs?
Guidelines can be categorized into many operations. Some of them are discussed below:
For windows:
Will the window open properly based on related typed or menu-based commands?
Can the window be resized, moved, scrolled?
Does the window properly regenerate when it is overwritten and then recalled?
Are all functions that relate to the window available when needed?
Are all functions that relate to the window available when needed?
Are all functions that relate to the window operational?
Are all relevant pull-down menus, tool bars, scroll bars, dialog boxes, and
buttons, icons, and other controls available and properly represented?
Is the active window properly highlighted?
If multiple or incorrect mouse picks within the window cause unexpected side
effects?
Are audio and/or color prompts within the window or as a consequence of
window operations presented according to specification?
Does the window properly close?
Data entry:
Is alphanumeric data entry properly echoed and input to the system?
Do graphical modes of data entry (e.g., a slide bar) work properly?
Is invalid data properly recognized?
Are data input messages intelligible?
Are basic standard validation on each data is considered during the data entry
itself?
Once the data is entered completely and if a correction is to be done for a specific
data, does the system requires entering the entire data again?
Does the mouse clicks are properly used?
Does the help buttons are available during data entry?
In addition to the above guidelines, finite state modeling graphs may be used to derive a
series of tests that address specific data and program objects that are relevant to the GUI.
62
5.2. Testing of Client/Server Architectures
Client/server architectures represent a significant challenge for software testers. The
distributed nature of client/server environments, the performance issues associated with
transaction processing, the potential presence of a number of different hardware
platforms, the complexities of network communication, the need to service multiple
clients from a centralized (or in some cases, distributed) database, and the coordination
requirements imposed on the server all combine to make testing of C/S architectures and
the software that reside within them considerably more difficult than testing standalone
applications. In fact, recent industry studies indicate a significant increase in testing time
and cost when C/S environments are developed.
Documentation testing can be approached in two phases. The first phase, formal technical
review, examines the document for editorial clarity. The second phase, live test, users the
documentation in conjunction with the use of the actual program.
The only viable way to answer these questions is to have an independent third party to
test the documentation in the context of program usage. All discrepancies are noted, and
areas of document ambiguity or weakness are defined for potential rewrite.
Questions
1. Explain the need for GUI testing and its complexity?
2. List the guidelines required for a typical tester during GUI testing?
3. Select your own GUI based software system and test the GUI related functions by
using the listed guidelines in this Chapter.
64
6. Chapter: SOFTWARE TESTING STRATEGIES
6.2. Introduction
A strategy for software testing integrates software test case design methods into a well-
planned series of steps that result in the successful construction of software. As
important, a software testing strategy provides a road map for the software developer, the
quality assurance organization, and the customer- a road map that describes the steps to
be conducted as part of testing, when these steps are planned and then undertaken, and
how much effort, time, and resources will be required. Therefore any testing strategy
must incorporate test planning, test case design, test execution, and resultant data
collection and evaluation.
A software testing strategy should be flexible enough to promote the creativity and
customization that are necessary to adequately test all large software-based systems. At
the same time, the strategy must be rigid enough to promote reasonable planning and
management tracking as the project progresses. Shooman suggests these issues:
65
In many ways, testing is an individualistic process, and the number of different
types of tests varies as much as the different development approaches. For many
years, our only defense against programming errors was careful design and the
native intelligence of the programmer. We are now in an era in which modern
design techniques are helping us to reduce the number of initial errors that are
inherent in the code. Similarly, different test methods are beginning to cluster
themselves into several distinct approaches and philosophies.
A software testing strategy should be flexible enough to promote the creativity and
customization that are necessary to adequately test all large software based systems.
At the same time, the strategy must be rigid enough to promote reasonable planning and
Management tracking as the project progresses.
Testing activity can be planned and conducted systematically; hence, to be very specific test case
design methods are defined called as templates.
A number of software testing has been proposed in the literature. All provide the software
developer with a template for testing and all have the following generic characteristics.
66
Testing begins at module level or class or object level in object-oriented systems and
works Outward toward the integration of the entire computer based system.
Different techniques are appropriate at different points in time
Testing is conducted by the developer of the software and, an independent test group
for large projects
Testing and debugging are different activities, but debugging must be accommodated
in any testing strategy
A strategy for software testing must accommodate low-level tests that are necessary
to verify that a small source code segment has been correctly implemented as well as
high-level tests that validate major customer requirements.
A strategy must provide guidance for the practitioner and a set of milestones for the
manager. Because the steps of the test strategy occur at a time when deadline pressure
begins to rise, progress must be measurable and problems must surface.
Software testing is one element of a broader topic that is often referred to as verification
and validation(V&V). Verification refers to the set of activities that ensure that software
correctly implements a specific function. Validation refers to a different set of activities
that ensure that the software that has been built is traceable to customer requirements.
Software testing is one element of a broader topic that is often referred to as verification
and validation (V&V).
Verification refers to the set of activities that ensure that correctly implements a
specific function.
Validation refers to a different set of activities that ensure that the software that has
been built is traceable to customer requirements.
67
Boehm states like this.
Verification: "Are we building the product right"
Validation: "Are we building the right product?"
Formal
Software Technical
Measurement
Engineering Review
Methods
Quality
Standards Testing
And SCM
Procedures &
SQA
Figure 5.1 shows by application of methods and tools, effective formal technical
reviews, and solid management and measurement all lead to quality that is confirmed
during testing.
Testing provides the last bastion from which quality can be assessed and, more
pragmatically, errors can be uncovered.
However, testing should not be viewed as a safety net. Quality cannot be tested it won't
be when you begin testing and when finished testing Quality is incorporated throughout
software process.
Note:
It is important to note that V&V encompass a wide array of SQA activities that include
formal technical reviews, quality and configuration audits, performance monitoring,
Simulation, feasibility study, documentation review, database review, algorithm analysis,
development testing, qualification testing and installation testing.
Although testing plays an extremely important role in V&V, many other activities are
also necessary.
68
6.2. Organizing for software testing
The software developer is always responsible for testing the individual units (modules) of
the program ensuring that each performs the function for which it was designed. In many
cases, the developer also conducts integration testing - A testing step that leads to
construction of the complete program structure. Only after the software architecture is
complete, does an independent test group (ITG) become involved?
The role of an ITG is to remove the inherent problems associated with letting the builder
test the thing that has been built. Independent testing removes the conflict of interest that
may otherwise present. After all, personnel in the ITG team are paid to find errors.
How ever, the software developer does not turn the program over to ITG and walk away.
The developer and the ITG work closely throughout a software project to ensure that
thorough tests will be conducted. While testing is conducted, the developer must be
available to correct errors that are uncovered.
The ITG is part of the software development project team in the sense that it becomes
involved during the specification process and stays involved (planning and specifying test
procedures) throughout a large project.
However, in many cases the ITG reports to the SQA organization, there by achieving a
degree of independence that might not be possible if it were a part of the software
development organization.
To develop computer software, We spiral in along streamlines that decrease the level of
Abstraction on each turn.
69
System
S
Engineering
Requirements
R
D Design
C Code
Unit Test
U
I
Integration test
V
Validation Test
System test S
T
The strategy for software testing may also be viewed in the context of the spiral.
Unit testing begins at the vortex of the spiral and concentrates on each unit of the
software as implemented in source code. Testing progresses by moving outward along the
spiral to integration testing, where the focus is on design and the construction of the
software architecture. Talking another turn outward on the spiral, we encounter
Validation testing where requirements established as part of software requirements
analysis are validated against the software that has been constructed. Finally, We arrive at
system testing where the software and other system elements are tested as a whole.
To test computer software, we spiral out along streamlines that broaden the scope of
testing with each turn.
Considering the process from a procedural point of view testing within the context of
software engineering is a series of four steps that are implemented sequentially.
The steps are shown In Figure 5.3 initially tests focus on each module individually,
assuring that it functions as a unit hence the name unit testing. Unit testing makes heavy
use of white-box testing techniques, exercising specific paths in a module's control
70
structure to ensure complete coverage and maximum error detection. Next, modules must
be assembled or integrated to form the complete software package. Integration testing
addresses the issues associated with the dual problems of verification and program
construction. Black-box test case design techniques are most prevalent during integration,
although a limited amount of white -box testing may be used to ensure coverage of major
control paths. After the software has been integrated (constructed), sets of high-order test
are conducted. Validation criteria (established during requirements analysis) must be
tested. Validation testing provides final assurance that software needs all functional,
behavioral and performance requirements. Black-box testing techniques are used
exclusively during validation.
The last high-order testing step falls outside the boundary of software engineering and
into the broader context of computer system engineering. Software once validated must
be combined with other system elements (e.g., hardware, people, and databases). System
testing verifies the tall elements mesh properly and that overall system
function/performance is achieved.
Integrationtest
test
Design Integration
Code
Unit test
Unit test
71
6.4. Criteria for completion of testing
Using statistical modeling and software reliability theory, models of software failures
(uncovered during testing) as a function of execution time can be developed.
Version of the failure model called a logarithmic Poisson execution-time model takes the
form:
Where
f(t) = cumulative number of failures that are expected to occur once the software has n
has been tested for a certain amount of execution time, t
lo = the initial software failure intensity (failures per unit time) at the beginning of
testing
p = the exponential reduction in failure intensity as errors are uncovered and repairs
are made.
The instantaneous failure intensity, l(t) can be derived by taking the derivative of f(t),
l(t) = lo(lo pt +1) (2)
Using the relationship noted in equation 2, testers can predict the drop off of errors as
testing progresses. The actual error intensity can be plotted against the predict curve
figure 5.4. If the actual data gathered during testing and logarithmic Poisson execution-
time model are responsibly close to one another over a number of data points, the model
can be used to predict total testing time required to achieve an acceptable low failure
intensity.
72
per test hour
Execution time, t
Following issues must be addressed if a successful software strategy is to be implemented
Specify product requirements in a quantifiable manner long before testing
commences. Although the overriding objective of testing is to find errors good
testing strategy also assesses other quality characteristics such as portability,
maintainability , usability .These should be specified in a way that is measurable so
that testing results are unambiguous.
State testing objectives explicitly. The specific objectives of testing should be stated
in measurable terms for example, test effectiveness, test coverage, meantime to
failure, the cost to find and fix defects, remaining defect density or frequency of
occurrence, and test work - hours per regression test should all be stated within the
test plan.
Understand the users of the software and develop a profile for each user
category.use cases ,which describe interaction scenario for each class of user can
reduce overall testing effort by focussing testing on actual use of the product.
73
Develop a testing plan that emphasizes "rapid cycle testing".
The feedback generated from the rapid cycle tests can be used to control quality levels
and corresponding test strategies.
Build "robust" software that is designed to test itself. Software should be designed
in a manner that uses antibugging techniques. that is software should be capable of
diagnosing certain classes of errors. In addition, the design should accommodate
automated testing regression testing.
Use effective formal technical reviews as a filter prior to testing. formal technical
reviews can be as effective as testing in uncovering errors. For this reason, reviews
can reduce the amount of testing effort that is required to produce high-quality
software.
Conduct formal technical reviews to assess the test strategy and test cases
themselves. Formal technical reviews can uncover inconsistencies, omissions, and
outright errors in the testing approach. This saves time and improves product quality.
Develop a continuous improvement approach for the testing process. The test strategy
should be measured. The metrics collected during testing should be used as part of a
statistical process control approach for software testing.
Unit testing focuses verification efforts on the smallest unit of software design the
module. Using the procedural design description as guide, important control paths are
tested to uncover errors within the boundary of the module . the relative complexity of
tests and uncovered errors are limited by the constraint scope established for unit testing.
The unit test is normally white-box oriented, and the step can be conducted in parallel for
multiple modules.
74
6.6.1. Unit test consideration
The tests that occur as part of unit testing are illustrated schematically in figure 5.
The module interface is tested to ensure that information properly flows into and out of
the program unit under test. The local data structure is examined to ensure the data stored
temporarily maintains its integrity during all steps in an algorithm's execution.
Boundary conditions are tested to ensure that the module operates properly at boundaries
established to limit or restrict processing. All independent paths through the control
structure are exercised to ensure that all statements in a module have been executed at
least once. And finally, all error-handling paths are tested.
Tests of data flow across a module interface are required before any other test is initiated.
If data do not enter and exit properly, all other tests are doubtful.
Module
Module
-----------
-----------
------------ Interface
------------
------------- Local data
-------------
structures
Boundary
conditions
Independent paths
Error handling
paths
Test
TestCases
Cases
75
6.6.2. Checklist for interface tests.
When a module performs external I/O, following additional interface test must be
conducted.
1. File attributes correct?
2. Open/Close statements correct?
3. Format specification matches I/O statements?
4. Buffer size matches record size?
5. Files opened before use?
6. End-of-File conditions handled?
7. I/O errors handled
8. Any textual errors in output information?
The local data structure for a module is a common source of errors .Test cases should be
designed to uncover errors in the following categories
76
3. incorrect variable names
4. inconsistent data types
5. underflow, overflow, and addressing exception
In addition to local data structures, the impact of global data on a module should be
ascertained during unit testing.
Selective testing of execution paths is an essential task during the unit test. Test cases
should be designed to uncover errors to erroneous computations; incorrect comparisons
are improper control flow. Basis path and loop testing are effective techniques for
uncovering a broad array of path errors.
Good design dictates that error conditions be anticipated and error handling paths set up
to reroute or cleanly terminate processing when an error does occur.
77
Among the potential errors that should be tested when error handling is evaluated
are:
1. Error description is unintelligible
2. Error noted does not correspond to error encountered
3. Error condition causes system intervention prior to error handling
4. Exception-condition processing is incorrect
5. Error description does not provide enough information to assist in the location of the
cause of the error.
Boundary testing is the last task of the unit tests step. software often files at its
boundaries. That is, errors often occur when the n th element of an n-dimensional array is
processed; when the ith repetition of a loop with i passes is invoke; or when the maximum
or minimum allowable value is encountered. Test cases that exercise data structure,
control flow and data values just below, at just above maxima and minima are Very likely
to uncover errors.
Unit testing is normally considered as an adjunct to the coding step. After source-level
code has been developed, reviewed, and verified for correct syntax, unit test case design
begins. A review of design information provides guidance for establishing test cases that
are likely to uncover errors in each of the categories discussed above. Each test case
should be coupled with a set of expected results.
Because a module is not a standalone program, driver and or stub software must be
developed for each unit test. The unit test environment is illustrated in figure 5.6.In most
applications a driver is nothing more than a "Main program" that accepts test case data,
passes such data to the test module and prints relevant results. Stubs serve to replace
modules that are subordinate to the module that is to be tested. A stub or "dummy sub
program" uses the subordinate module's interface may do minimal data manipulation
prints verification of entry, and returns.
78
Drivers and stubs represent overhead. That is, both are software that must be developed
but that is not delivered with the final software product. If drivers and stubs are kept
simple, actual overhead is relatively low. Unfortunately, many modules cannot be
adequately unit tested with "simple" overhead software. In such cases, Complete testing
can be postponed until the integration test step (Where drivers or stubs are also used).
Unit test is simplified when a module with high cohesion is designed. When a module
addresses only one function, the number of test cases is reduced and errors can be more
easily predicted and uncovered.
Driver
Interface
Module to be tested Local data structures
Boundary conditions
Independent paths
Error handling paths
Test Cases
RESULTS
Integration testing is a systematic technique for constructing the program structure while
conducting tests to uncover errors associated with interfacing.
The objective is to take unit tested modules and build a program structure that has been
dictated by design.
79
6.7.1. Different Integration Strategies
Integration testing is a systematic technique for constructing the program structure while
conducting tests to uncover errors associated with interfacing. The objective is to take
unit tested modules and build a program structure that has been dictated by design.
There are often a tendency to attempt non-incremental integration; that is, to contrtruct
the program using a big bang approach. All modules are combined in advance. The
entire program is tested as a whole. And chaos usually results! A set of errors is
encountered. Correction is difficult because isolation of causes is complicated by the vast
expanse of the entire program. Once these errors are corrected, new ones appear and the
process continues in a seemingly endless loop.
Incremental integration is the antithesis of the big bang approach. The program is
constructed and tested in small segments, where errors are easier to isolate and correct;
interfaces are more likely to be tested completely; and a systematic test approach may be
applied. We discuss some of incremental methods here:
80
3. Tests are conducted as each modules are integrated
4. On completion of each set of tests, another stub is replaced with real module
5. Regression testing may be conducted to ensure that new errors have not been
introduced
The process continues from step2 until the entire program structure is built.
1. Delay many tests until stubs are replaced with actual modules.
2. Develop stubs that perform limited functions that simulate the actual module
3. Integrate the software from the bottom of the hierarchy upward
The first approach causes us to lose some control over correspondence between specific
tests and incorporation of specific modules. this can lead to difficulty in determining the
cause of errors tends to violate the highly constrained nature of the top down approach.
The second approach is workable but can lead to significant overhead, as stubs become
increasingly complex. The third approach is discussed in next section.
81
A bottom-up integration strategy may be implemented with the following steps:
1. Low-level modules are combined into clusters that perform a specific software sub
function.
2. A driver is written to coordinate test case input and output.
3. The cluster is tested.
4. Drivers are removed and clusters are combined moving upward in the program
structure.
As integration moves upward, the need for separate test drivers lessens. In fact, if the top
two levels of program structure are integrated top-down, the number of drivers can be
reduced substantially and integration of clusters is greatly simplified.
Each time a new model is added as a part of integration testing, the software changes.
New data flow paths are established, new I/O may occur, and new control logic is
invoked. These changes may cause problems with functions that previously worked
flawlessly. In the context of an integration test, strategy regression testing is the re-
execution of subset of tests that have already been conducted to ensure that changes have
not propagated unintended side effects.
Regression testing is the activity that helps to ensure that changes do not introduce
unintended behavior or additional errors.
Regression testing may be conducted manually, by re-executing a subset of all test cases
or using automated capture playback tools.
Capture-playback tools enable the software engineer to capture test cases and results for
subsequent playback and comparison.
82
The regression test suite contains three different classes of test cases.
1. A representative sample of tests that will exercise all software functions.
2. Additional tests that focus on software functions that are likely to be affected by the
change.
3. Tests that focus on software components that have been changed.
Note:
It is impractical and inefficient to re-execute every test for every program function once
a change has occurred.
Selection of an integration strategy depends upon software characteristics and some time
project schedule. In general, a combined approach that uses a top-down strategy for upper
levels of the program structure, coupled with bottom-up strategy for subordinate levels
may be best compromise.
An overall plan for integration of the software and a description of specific tests are
documented in a test specification. The specification is deliverable in the software
engineering process and becomes part of the software configuration.
83
Test Specification Outline
I. Scope of testing
II. Test Plan
1. Test phases and builds
2. Schedule
3. Overhead software
4. Environment and resources
III. Test Procedures
1. Order of integration
Purpose
Modules to be tested
2. Unit test for modules in build
Description of test for module n
Overhead software description
Expected results
3. Test environment
Special tools or techniques
Overhead software description
4. Test case data
5. Expected results for build
IV. Actual Test Results
V. References
VI. Appendices
The Following criteria and corresponding tests are applied for all test phases.
Interfaces integrity. Internal and external interfaces are tested as each module is
incorporated into the structure.
Functional Validity. Tests designed to uncover functional error are conducted.
Information content. Tests designed to uncover errors associated with local or global
data structures are conducted.
Performance Test designed to verify performance bounds established during software
design are conducted.
84
A schedule for integration, overhead software, and related topics are also discussed as
part of the test Plan section. Start and end dates for each phase are established and
availability windows for unit tested modules are defined. A brief description of overhead
software(stubs and drivers) concentrates on characteristics that might require special
effort. Finally, test environments and resources are described.
Validation can be defined in many ways, but a simple definition is that validation
succeeds when software functions in a manner that can be reasonable expected by
customer.
A test plans outlines the classes of tests to be conducted and a test procedure defines
specific test cases that will be used in an attempt to uncover errors in conformity with
requirements. Both the plan and procedure are designed to ensure that:
85
Other requirements like portability, error recovery, and maintainability are met.
The beta test is conducted at one or more customer sites by the end user(S) of the
software . Unlike alpha testing the developer is generally not present therefore the beta
test is a "live".
86
6.9. System Testing
System testing is actually a series of different tests whose primary purpose is to fully
exercise the computer-based system. Although each test has a different purpose, all Work
to verify that all system elements have been properly integrated and perform allocated
functions.
A classic system testing problem is finger pointing. This occurs when an error is
uncovered, and each system element developer blames the other for the problem. Rather
than indulging in such nonsense, the software engineer should anticipate potential
interfacing problems and 1) design error-handling paths that test all information coming
from other elements of the system; 2) conduct a series of tests that simulate bad data or
other potential errors at the software interface;3) record the results of tests to use as
evidence if finger pointing does occur; and 4) participate in planning and design of
system tests to ensure that software is adequately tested.
In the section that follows, we discuss the types of system tests that are worthwhile for
software -based system.
Many computer-based systems must recover from faults and resume processing within a
pre-specified time. In some cases, a system must be fault tolerant; that is, processing
faults must not cause overall system function to cease. In other cases, a system; failure
must be corrected within a specified period of time or severe economic damage will
occur.
Recovery testing is a system test that forces the software to fail in a variety of ways and
verifies that recovery is properly performed. If recovery is automatic (performed by the
system itself), re-initialization, check pointing, mechanism, data recovery, and restart are
each evaluated for correctness. If recovery requires human intervention, the mean time to
repair is evaluated to determine whether it is within acceptable limits.
87
6.9.2. Security Testing
Security testing attempts to verify that protection mechanisms built into a system will in
fact protect it from improper penetration.
During security testing, the tester plays the role(s) of the individual who desires to
penetrate the system. Anything goes! The tester may attempt to acquire passwords
through external clerical means, may attack the system with custom software designed to
break down any defenses that have been constructed; may overwhelm the system, thereby
denying service to others; may purposely cause system errors, hoping to penetrate during
recovery; may browse through insecure data, hoping to find the key to system entry; and
so on.
Given enough time and resources, good security testing will ultimately penetrate a
system. The role of the system designer is to make penetration cost greater than the value
of the information that will be obtained.
Stress test is designed to confront programs with abnormal situations. In essence, the
tester who performs stress testing asks:"How high can we crank this up before it fails"?
Stress testing executes a system in a manner that demands resources in abnormal
quantity, frequency, or volume.
For example
1. Special test may be designed that generate 10 interrupts per second when 1 or 2 is the
average rate.
2. Input data rates may be increased by an order of magnitude to determine how input
function will respond.
3. Test cases that require maximum memory or other resources may be executed
4. Test cases that may cause trashing in a virtual operating system may be designed.
5. Test cases that may cause excessive hunting for disk resident data may be created.
88
A variation of stress testing is a technique called sensitivity testing. In some situation, a
very small range of data contained within the bounds of valid data for a program may
cause extreme and even erroneous processing or profound performance degradation. This
situation analogous to a singularity in a mathematical function.
Sensitivity testing attempts to uncover data combinations within valid input classes that
may cause instability or improper processing.
6.9.5. Debugging
Software testing is a process that can be systematically planned and specified. Test case
design can be conducted, a strategy can be defined, and results can be evaluated against
prescribed expectations.
Debugging occurs as a consequence of successful testing. That is, when a test case
uncovers an error, debugging is the process that results in the removal of the error.
Debugging is not testing, but it always occurs as consequence of testing as shown in
figure 5.7.
89
6.9.6. The Debugging Process
The debugging process begins with the execution of a test case. As shown in fig 5.7, the
debugging process begins with the execution of a test case. Results are assessed and a
lack of correspondence between expected and actual is encountered. In many cases, the
non-corresponding data is a symptom of an underlying cause as yet hidden. The
debugging process attempts to match symptom with cause, thereby leading to error
correction.
The debugging process attempts to match symptom with cause, there by leading to error
correction.
The debugging process will always have two outcomes:
1. The cause will be found, corrected, and removed
2. The cause will not be found.
In the latter case, the person performing debugging may suspect a cause, design a test
case to help validate his/her suspicion, and work toward error correction in iterative
fashion.
Execution of Cases
Results
Results
TestCases
Cases
Test
Additional
Tests
Suspected causes
The brute force category of debugging is probably the most common and efficient
method for isolating the cause of a software error. Brute force debugging methods are
applied when all methods of debugging fail. Using a philosophy, memory dumps are
taken, run time traces are invoked and the program is loaded with WRITE statement.
When this is done, one finds a clue by the information produced which leads to cause of
an error.
Backtracking is a common debugging approach that can be used successfully in small
programs. Beginning at the site where a symptom has been uncovered, the source code is
traced backward (manually) until the site of the cause is found. This process has a
limitation when the source lines are more.
91
Cause Elimination is manifested by induction or deduction and introduces the concept of
binary partitioning. Data related to the error occurrence are organized to isolate potential
causes.
Alternatively, a list of all possible causes is developed and tests are conducted to
eliminate each.
If initial tests indicate that a particular cause hypothesis shows promise the data are
refined in an attempt to isolate the bug.
6.10. Summary
Software testing accounts for the largest percentage of technical effort in the
software process. Yet, we are only beginning to understand the subtleties of
systematic test planning, execution and control.
The objective of software testing is to uncover errors.
To fulfill this objective, a series of test step-unit, integration, validation, and
system tests-are planned and executed.
Unit and integration tests concentrate on functional verification of a module and
incorporation of modules into a program structure.
Validation testing demonstrates tractability to software requirements, and
System testing validates software once it has been incorporated into a larger
system.
Each test step is accomplished through a series of systematic test techniques that
assist in the design of test cases. With each testing step, the level of abstraction
with which software is considered is broadened.
Unlike testing, debugging must be viewed as an art. Beginning with a
symptomatic indication of a problem, the debugging activity tracks down the
cause of an error. Of the many resources available during debugging, the most
valuable is the counsel of other software engineers.
The requirement for higher-quality software demands a more systematic approach
to testing.
92
Questions
1. What is the difference between Verification and Validation? Explain in your own
words.
2. Explain unit test method with the help of your own example.
3. Develop an integration testing strategy for any the system that you have
implemented already. List the problems encountered during such process.
References
3. The Art of Software Testing, by Glenford J. Myers, John Wiley & Sons.
93
7. Chapter: SOFTWARE QUALITY STANDARS
What is it?
It is a model that companies can use to measure the maturity of their process
CMM is used by companies to help define their goals of managing the software
process.
95
1.1. The 5 Levels of Software Process Maturity
Optimizing (5)
Managed (4) Focus on process
Process measured improvement
and controlled
Defined (3)
Process
characterized
fairly well
Repeatable (2) understood
Can repeat
previously
mastered tasks
Initial (1)
Unpredictable and
poorly controlled
Level 1 Initial
Characteristics
Ad hoc
Little formalization
Tools Informally applied to process
Success depends on individual efforts
96
If projects are successful, it is usually do the efforts of a few individuals in the
organization.
SEI roots the cause of a level 1 organization as poor management
Software process is unpredictable because of constant change and modification as
work progresses
Schedules, budgets, functionality and product quality are unpredictable
Level 2- Repeatable
Characteristics
Basic project management established
Process discipline in place
97
Subcontract Management select qualified contractors and managed them
effectively.
Quality Assurance provides management with appropriate visibility into process
and products.
Configuration Management establish & maintain integrity of products
throughout software life cycle.
Level 3 Defined
Characteristics
Software processes are documented and standardized
All projects use a standard software process
Characteristics
Measurement of software process
Measurement of product quality
Characteristics
Continues process improvement
Piloting of innovative ideas and technologies
Externals
a) Identify qualified contractors
b) Monitor qualified contractors
99
SPA
a) Determines the state of an organizations current software process. Does the
company have a defined process? If so what level is this process at?
b) Identify improvement priorities. What are the organizations priorities in
refining their software process?
c) Does organization have individuals devoted to software process
improvement? Does management back SPI?
SCE
Outside assessment team select. This team should be trained n the fundaments
concepts of the CMM as well as the specifics of the assessment or evaluation
method. The methods of the team should be professionals knowledgeable in
software engineering and management.
The second step is to have representatives from the site to be assessed or evaluated
complete the maturity questionnaire and other diagnostic instrument. Once this
activity is completed, the assessment or evaluation team performs a response
analysis, (step 3, with tallies the responses to the questions and identifies those
areas where further exploration is warranted. The areas to be investigated
correspond to the CMM key process areas.
The team is now ready to visit the site being assessed or evaluated (step 4). The
team conducts interviews and reviews documentation to gain an understanding of
the software process followed by the site.
At the end of the on-site period, the team produces a list of findings (step 5) that
identifies the strengths and weaknesses of the organizations software process.
Finally, the team prepares a key process area profile (step 6) that shows the areas
where the organization has, and has not, satisfied the goals of the key process
areas.
100
Management View of Visibility of Software Process
Level 1
a) Amorphous entity, b) activities poorly staged
Level 2
a) Process is viewed as a succession of black boxes
b) Customer requirements are controlled
c) Project management practices are established
Level 3
a) The internal structure of the boxes, i.e. tasks in the projects defined
software process are visible
b) Internal structure represents the way the organizations standard software
process has been applied to specific projects.
Level 4
a) Processes are instrumented and controlled quantitatively.
b) Ability to predict outcomes grows steadily more precise
Level 5
a) New and improved ways of building software are continually tried in a
controlled manner to improve productivity and quality.
Only large companies will be considered by the SEI for the assessment
One companys assessment cost $40,000
Some of the statistics shown are fuzzy and cannot be readily proved
101
An organization must built foundations on one level before it can move to the
next. If you miss one question, you can be assessed for the level that the question
pertains to.
In the late 1980s, as the popularity of the Malcolm Baldrine Award was peaking,
an engineer and statistician at Motorola, Dr. Mike Harry, began to study process
variation as a way to improve performance. Dr. Harry formalized his Six Sigma
Philosophy into a system for measurably improving business quality.
Dr. Harry is commonly viewed as the father of Six Sigma. He is currently co-
founder and member of the Board of directors of Arizona-based Six Sigma
Academy and claims ownership of Six Sigma terminology, although many firms
use the terms freely.
The Six Sigma approach became the focal point of Motorolas quality effort and a
way of doing business. Motorolas CEO began to tout the benefits of the
methodology and other executives began to listen. Soon companies like general
Electric, Allied Signal and Texas Instruments were on board.
The concept has since spread widely throughout the manufacturing sector, and
within the last two years, it has been receiving attention and interest in the
financial services sectors. Six Sigma methodologies are delivering post results in
the service sector and the popularity of the technique is expected to grow.
Overview.
Although definitions vary slightly by source, the most common might be: a
disciplined, data-driven approach and methodology for eliminating defects in any
process- from manufacturing to transactional and from product to service.
Process Focus (at its cores, Six Sigma is about measuring process
variations)
Meeting customer Needs (process outputs must meet customer
requirements)
Data Driven (rigorous analytical methods drive improvements that deliver
measurable differences felt by the customer)
Although these terms are quite common they are not universal. They indicate peer
recognition, not registration or licensure.
Statistics.
103
process of performing at two Sigma. This means that the process performs
correctly (meets customer requirements) 95% of the time.
The concept of Six Sigma is based on the theory of variation, meaning that all
things that are measured fine enough will vary. Variation in a process is driven by:
machines, materials, methods, measurement systems, environment and people.
When there is no undue influence by any one of these Six factors, the variation
produced is called common cause or normal variation. When one or more of
the compounds have an undue influence, Special cause or abnormal
variation exists, which takes the form of multiple or binomial distributions in
statistics language. This distinction is critical in order to select the best course for
management intervention because only abnormal variation can be corrected or
reduced.
There are two methods for calculating sigma, the Discrete method and Continuous
method. The Discrete Method assumes that the customer gives credit to the
service or product provide if only some of the customer requirements are met, so it
may be misleading. The Continuous Method is more appropriate for more
demanding customers. It tends to be more accurate in that it provides a picture of
the magnitude of variation, the type of variation, common or special cost variation,
an requires data collection.
Once the average and standard deviation (sigma0 of a process becomes known,
more specific measures of process performance or capability are typically applied.
These include capability ratio, capability index, and capability index compared to
some constant. The capability ratio compares process performance against the
customer specification. The capability index is the inverse of the capability ratio.
These calculations have limitation in that they are based on the assumption that the
process is centered at the mean, when in reality, processes drift from their
104
intended centers overtime. A more precise measure is therefore the capability
index compared to a constant-k. There are Two formulas that can be used. One is
used when the center of the distribution is closer to the upper customer
specification; another is used when the center is closer to the lowest specification.
When applying these formulas, consideration must be given to short- term vs.
long-term process performance. In other words, a given data sample should be
considered short-term due to the variability of performance over time. On general,
the larger the sample size and/ or number of samples taken, the more accurate the
result.
Once an organization has decided to implement the Six Sigma methodology, there
are some initial steps that need to be completed:
Develop process maps for core processes, key sub-processes and enabling
processes, and assign a process owner for each.
Develop a measurement dashboard or scorecard for each process. (all
measures for a give process)
Develop a data collection plan (measure options, data sources, collection
forms, etc.) for each dashboard process and collect sufficient data.
Create project selection criteria and weight factors for choosing projects
which should include impact on business objectives, current process
performance, current process cost or financial impact, feasibility (difficulty,
use of resources, time commitment), etc.
Rate processes and select potential Six Sigma project(s) based on overall
score.
It should be noted here that Six Sigma is not a business strategy. In fact, Six Sigma
would assume that strategic business objectives have already been developed. Processes
that are selected for Six Sigma projects are those that most closely relate to strategic
objectives. Once the initial program setup steps have been completed and an individual
project has been selected, the typical Six Sigma project would include the following
steps:
105
Establish baseline process performance and current sigma.
Determine process defects and conduct root cause analysis
Develop alternatives and select solution.
There are two Six Sigma methodologies that are alternately used depending upon
the type of project. For developing new processes at Six Sigma performance
levels, the methodology is DMADV (define, measure, analyze, design, verify). For
the far more common processes. An illustration of the DMAIC project process
(Copyright 2000 by Thomas Pyzdek) follows:
Define
Control Measure
Analyze
Improve
106
As you can see, the organizations listed above are manufacturing companies, but
more recently six Sigma has spread into financial services. Good examples of this
include GE Capital Services, American Express, J.P. Morgan, Fannie Mae, Liberty
Insurance, Mount Carmel health and State Street Bank.
The concept of a separate manager and staff of specialists devoting full time to
quality control has a minority acceptance.
It appears that in spite of these concerns, there is significant potential for Six
Sigma to continue to expand into the financial services industry as reengineering
did in the 90s. A methodology that has demonstrated track record of delivering
process improvement, increasing customer satisfaction and delivering bottom line
results will be hard to resist.
# 2 Identify the Customer(s) for your product or service and determine what they
consider important
In other words WHO USES YOUR PRODUCT AND SERVICES?
#3 identify your needs (to provide product / service so that it satisfies the
customer.
In other words . WHAT DO YOU NEED TO DO YOUR WORK?
107
In other words . HOW PERFECTLY ARE YOU DOING OUR CUSTMER
FOCUSED WORK?
While black-box and white-box are terms that are still in popular use, many people prefer the
terms "behavioral" and "structural". Behavioral test design is slightly different from black-box test
design because the use of internal knowledge isn't strictly forbidden, but it's still discouraged. In
practice, it hasn't proven useful to use a single test design method. One has to use a mixture of
different methods so that they aren't hindered by the limitations of a particular one. Some call this
"gray-box" or "translucent-box" test design, but others wish we'd stop talking about boxes
altogether.
It is important to understand that these methods are used during the test design phase, and their
influence is hard to see in the tests once they're implemented. Note that any level of testing (unit
testing, system testing, etc.) can use any test design methods. Unit testing is usually associated
with structural test design, but this is because testers usually don't have well-defined
requirements at the unit level to validate.
Note that the definitions of unit, component, integration, and integration testing are recursive :
Unit. The smallest compliable component. A unit typically is the work of one programmer (At
least in principle). As defined, it does not include any called sub-components (for procedural
languages) or communicating components in general.
Unit Testing: in unit testing called components (or communicating components) are replaced
with stubs, simulators, or trusted components. Calling components are replaced with drivers or
trusted super-components. The unit is tested in isolation .
Note: The reason for "one or more" as contrasted to "Two or more" is to allow for components
that call themselves recursively.
component testing: same as unit testing except that all stubs and simulators are replaced
with the real thing.
108
a. They have been compiled, linked, and loaded together.
b. They have successfully passed the integration tests at the interface between them.
Thus, components A and B are integrated to create a new, larger, component (A,B). Note that
this does not conflict with the idea of incremental integrationit just means that A is a big
component and B, the component added, is a small one.
Note: Sensitize is a technical term. It means inputs that will cause a routine to go down a
specified path. The inputs are to A. Not every input to A will cause A to traverse a path in which
B is called. Tbsa is the set of tests which do cause A to follow a path in which B is called. The
outcome of the test of B may or may not be affected.
There have been variations on these definitions, but the key point is that it is pretty darn formal
and there's a goodly hunk of testing theory, especially as concerns integration testing, OO testing,
and regression testing, based on them.
As to the difference between integration testing and system testing. System testing specifically
goes after behaviors and bugs that are properties of the entire system as distinct from properties
attributable to components (unless, of course, the component in question is the entire system).
Examples of system testing issues:
Resource loss bugs, throughput bugs, performance, security, recovery,
Transaction synchronization bugs (often misnamed "timing bugs").
109
Load testing is subjecting a system to a statistically representative (usually)
load. The two main reasons for using such loads is in support of software
reliability testing and in performance testing. The term "load testing" by itself is
too vague and imprecise to warrant use. For example, do you mean
representative load," "overload," "high load," etc. In performance testing, load is
varied from a minimum (zero) to the maximum level the system can sustain
without running out of resources or having, transactions >suffer (application-
specific) excessive delay.
A third use of the term is as a test whose objective is to determine the maximum
sustainable load the system can handle. In this usage, "load testing" is merely
testing at the highest transaction arrival rate in performance testing.
QA is more a preventive thing, ensuring quality in the company and therefore the
product rather than just testing the product for software bugs?
It also can depend on the development model. The more specs, the less
testers. The roles can play a big part also. Does QA own beta? Do you include
process auditors or planning activities?
These figures can all vary very widely depending on how you define "tester" and
"developer". In some organizations, a "tester" is anyone who happens to be
testing software at the time -- such as their own. In other organizations, a
"tester" is only a member of an independent test group.
It is better to ask about the test labor content than it is to ask about the
tester/developer ratio. The test labor content, across most applications is
generally accepted as 50%, when people do honest accounting. For life-critical
software, this can go up to 80%.
110
7. What is Software Testing?
Testing involves operation of a system or application under controlled conditions
and evaluating the results (eg, 'if the user is in interface A of the application while
using hardware B, and does C, then D should happen'). The controlled conditions
should include both normal and abnormal conditions. Testing should intentionally
attempt to make things go wrong to determine if things happen when they
shouldn't or things don't happen when they should. It is oriented to 'detection'.
Organizations vary considerably in how they assign responsibility for QA and
testing. Sometimes they're the combined responsibility of one group or individual.
Also common are project teams that include a mix of testers and developers who
work closely together, with overall QA processes monitored by project managers.
It will depend on what best fits an organization's size and business structure.
8. What are some recent major computer system failures caused by
Software bugs?
In March of 2002 it was reported that software bugs in Britain's national
tax system resulted in more than 100,000 erroneous tax overcharges.
The problem was partly attibuted to the difficulty of testing the
integration of multiple systems.
A newspaper columnist reported in July 2001 that a serious flaw was
found in off-the-shelf software that had long been used in systems for
tracking certain U.S. nuclear materials. The same software had been
recently donated to another country to be used in tracking their own
nuclear materials, and it was not until scientists in that country
discovered the problem, and shared the information, that U.S. officials
became aware of the problems.
According to newspaper stories in mid-2001, a major systems
development contractor was fired and sued over problems with a large
retirement plan management system. According to the reports, the
client claimed that system deliveries were late, the software had
excessive defects, and it caused other systems to crash.
In January of 2001 newspapers reported that a major European
railroad was hit by the aftereffects of the Y2K bug. The company found
that many of their newer trains would not run due to their inability to
recognize the date '31/12/2000'; the trains were started by altering the
control system's date settings.
News reports in September of 2000 told of a software vendor settling a
lawsuit with a large mortgage lender; the vendor had reportedly
delivered an online mortgage processing system that did not meet
specifications, was delivered late, and didn't work.
In early 2000, major problems were reported with a new computer
system in a large suburban U.S. public school district with 100,000+
students; problems included 10,000 erroneous report cards and
students left stranded by failed class registration systems; the district's
111
CIO was fired. The school district decided to reinstate it's original 25-
year old system for at least a year until the bugs were worked out of
the new system by the software vendors.
In October of 1999 the $125 million NASA Mars Climate Orbiter
spacecraft was believed to be lost in space due to a simple data
conversion error. It was determined that spacecraft software used
certain data in English units that should have been in metric units.
Among other tasks, the orbiter was to serve as a communications relay
for the Mars Polar Lander mission, which failed for unknown reasons in
December 1999. Several investigating panels were convened to
determine the process failures that allowed the error to go undetected.
Bugs in software supporting a large commercial high-speed data
network affected 70,000 business customers over a period of 8 days in
August of 1999. Among those affected was the electronic trading
system of the largest U.S. futures exchange, which was shut down for
most of a week as a result of the outages.
In April of 1999 a software bug caused the failure of a $1.2 billion
military satellite launch, the costliest unmanned accident in the history
of Cape Canaveral launches. The failure was the latest in a string of
launch failures, triggering a complete military and industry review of
U.S. space launch programs, including software integration and testing
processes. Congressional oversight hearings were requested.
A small town in Illinois received an unusually large monthly electric bill
of $7 million in March of 1999. This was about 700 times larger than its
normal bill. It turned out to be due to bugs in new software that had
been purchased by the local power company to deal with Y2K software
issues.
In early 1999 a major computer game company recalled all copies of a
popular new product due to software problems. The company made a
public apology for releasing a product before it was ready.
The computer system of a major online U.S. stock trading service
failed during trading hours several times over a period of days in
February of 1999 according to nationwide news reports. The problem
was reportedly due to bugs in a software upgrade intended to speed
online trade confirmations.
In April of 1998 a major U.S. data communications network failed for
24 hours, crippling a large part of some U.S. credit card transaction
authorization systems as well as other large U.S. bank, retail, and
government data systems. The cause was eventually traced to a
software bug.
January 1998 news reports told of software problems at a major U.S.
telecommunications company that resulted in no charges for long
112
distance calls for a month for 400,000 customers. The problem went
undetected until customers called up with questions about their bills.
In November of 1997 the stock of a major health industry company
dropped 60% due to reports of failures in computer billing systems,
problems with a large database conversion, and inadequate software
testing. It was reported that more than $100,000,000 in receivables
had to be written off and that multi-million dollar fines were levied on
the company by government agencies.
A retail store chain filed suit in August of 1997 against a transaction
processing system vendor (not a credit card company) due to the
software's inability to handle credit cards with year 2000 expiration
dates.
In August of 1997 one of the leading consumer credit reporting
companies reportedly shut down their new public web site after less
than two days of operation due to software problems. The new site
allowed web site visitors instant access, for a small fee, to their
personal credit reports. However, a number of initial users ended up
viewing each others' reports instead of their own, resulting in irate
customers and nationwide publicity. The problem was attributed to
"...unexpectedly high demand from consumers and faulty software that
routed the files to the wrong computers."
In November of 1996, newspapers reported that software bugs caused
the 411 telephone information system of one of the U.S. RBOC's to fail
for most of a day. Most of the 2000 operators had to search through
phone books instead of using their 13,000,000-listing database. The
bugs were introduced by new software modifications and the problem
software had been installed on both the production and backup
systems. A spokesman for the software vendor reportedly stated that 'It
had nothing to do with the integrity of the software. It was human error.'
On June 4 1996 the first flight of the European Space Agency's new
Ariane 5 rocket failed shortly after launching, resulting in an estimated
uninsured loss of a half billion dollars. It was reportedly due to the lack
of exception handling of a floating-point error in a conversion from a
64-bit integer to a 16-bit signed integer.
Software bugs caused the bank accounts of 823 customers of a major
U.S. bank to be credited with $924,844,208.32 each in May of 1996,
according to newspaper reports. The American Bankers Association
claimed it was the largest such error in banking history. A bank
spokesman said the programming errors were corrected and all funds
were recovered.
Software bugs in a Soviet early-warning monitoring system nearly
brought on nuclear war in 1983, according to news reports in early
1999. The software was supposed to filter out false missile detections
113
caused by Soviet satellites picking up sunlight reflections off cloud-
tops, but failed to do so. Disaster was averted when a Soviet
commander, based on a what he said was a '...funny feeling in my gut',
decided the apparent missile attack was a false alarm. The filtering
software code was rewritten.
9. Why is it often hard for management to get serious about quality
assurance?
114
continuous extensive testing to keep the inevitable bugs from running
out of control.
time pressures - scheduling of software projects is difficult at best,
often requiring a lot of guesswork. When deadlines loom and the
crunch comes, mistakes will be made.
egos - people prefer to say things like:
'no problem'
'piece of cake'
'I can whip that out in a few hours'
'it should be easy to update that old code'
instead of:
'that adds a lot of complexity and we could end up
making a lot of mistakes'
'we have no idea if we can do that; we'll wing it'
'I can't estimate how long it will take, until I
take a close look at it'
'we can't figure out what that old spaghetti code
did in the first place'
If there are too many unrealistic 'no problem's', the result is bugs.
poorly documented code - it's tough to maintain and modify code that is
badly written or poorly documented; the result is bugs. In many
organizations management provides no incentive for programmers to
document their code or write clear, understandable code. In fact, it's
usually the opposite: they get points mostly for quickly turning out code,
and there's job security if nobody else can understand it ('if it was hard to
write, it should be hard to read').
software development tools - visual tools, class libraries, compilers,
scripting tools, etc. often introduce their own bugs or are poorly
documented, resulting in added bugs.
11. How can new Software QA processes be introduced in an existing
organization?
A lot depends on the size of the organization and the risks involved. For
large organizations with high-risk (in terms of lives or property) projects,
serious management buy-in is required and a formalized QA process is
necessary.
Where the risk is lower, management and organizational buy-in and QA
implementation may be a slower, step-at-a-time process. QA processes
should be balanced with productivity so as to keep bureaucracy from
getting out of hand.
For small groups or projects, a more ad-hoc process may be appropriate,
depending on the type of customers and projects. A lot will depend on
115
team leads or managers, feedback to developers, and ensuring adequate
communications among customers, managers, developers, and testers.
In all cases the most value for effort will be in requirements management
processes, with a goal of clear, complete, testable requirement
specifications or expectations.
12. What is verification? validation?
117
certain actions or inputs, input of large numerical values, large complex
queries to a database system, etc.
performance testing - term often used interchangeably with 'stress' and
'load' testing. Ideally 'performance' testing (and any other 'type' of testing)
is defined in requirements documentation or QA or Test Plans.
usability testing - testing for 'user-friendliness'. Clearly this is subjective,
and will depend on the targeted end-user or customer. User interviews,
surveys, video recording of user sessions, and other techniques can be
used. Programmers and testers are usually not appropriate as usability
testers.
install/uninstall testing - testing of full, partial, or upgrade install/uninstall
processes.
recovery testing - testing how well a system recovers from crashes,
hardware failures, or other catastrophic problems.
security testing - testing how well the system protects against
unauthorized internal or external access, willful damage, etc; may require
sophisticated testing techniques.
compatability testing - testing how well software performs in a particular
hardware/software/operating system/network/etc. environment.
exploratory testing - often taken to mean a creative, informal software test
that is not based on formal test plans or test cases; testers may be
learning the software as they test it.
ad-hoc testing - similar to exploratory testing, but often taken to mean that
the testers have significant understanding of the software before testing it.
user acceptance testing - determining if software is satisfactory to an end-
user or customer.
comparison testing - comparing software weaknesses and strengths to
competing products.
alpha testing - testing of an application when development is nearing
completion; minor design changes may still be made as a result of such
testing. Typically done by end-users or others, not by programmers or
testers.
beta testing - testing when development and testing are essentially
completed and final bugs and problems need to be found before final
release. Typically done by end-users or others, not by programmers or
testers.
mutation testing - a method for determining if a set of test data or test
cases is useful, by deliberately introducing various code changes ('bugs')
118
and retesting with the original test data/cases to determine if the 'bugs' are
detected. Proper implementation requires large computational resources.
16. What are 5 common problems in the software development process?
poor requirements - if requirements are unclear, incomplete, too
general, or not testable, there will be problems.
unrealistic schedule - if too much work is crammed in too little time,
problems are inevitable.
inadequate testing - no one will know whether or not the program is
any good until the customer complains or systems crash.
featuritis - requests to pile on new features after development is
underway; extremely common.
miscommunication - if developers don't know what's needed or
customer's have erroneous expectations, problems are guaranteed.
17. What are 5 common solutions to software development problems?
solid requirements - clear, complete, detailed, cohesive, attainable,
testable requirements that are agreed to by all players. Use prototypes
to help nail down requirements.
realistic schedules - allow adequate time for planning, design, testing,
bug fixing, re-testing, changes, and documentation; personnel should
be able to complete the project without burning out.
adequate testing - start testing early on, re-test after fixes or changes,
plan for adequate time for testing and bug-fixing.
stick to initial requirements as much as possible - be prepared to
defend against changes and additions once development has begun,
and be prepared to explain consequences. If changes are necessary,
they should be adequately reflected in related schedule changes. If
possible, use rapid prototyping during the design phase so that
customers can see what to expect. This will provide them a higher
comfort level with their requirements decisions and minimize changes
later on.
communication - require walkthroughs and inspections when
appropriate; make extensive use of group communication tools - e-
mail, groupware, networked bug-tracking tools and change
management tools, intranet capabilities, etc.; insure that
documentation is available and up-to-date - preferably electronic, not
paper; promote teamwork and cooperation; use prototypes early on so
that customers' expectations are clarified.
18. What is software 'quality'?
'Good code' is code that works, is bug free, and is readable and maintainable.
Some organizations have coding 'standards' that all developers are supposed to
adhere to, but everyone has different ideas about what's best, or what is too
many or too few rules. There are also various theories and metrics, such as
McCabe Complexity metrics. It should be kept in mind that excessive use of
standards and rules can stifle productivity and creativity. 'Peer reviews', 'buddy
checks' code analysis tools, etc. can be used to check for problems and enforce
standards.
For C and C++ coding, here are some typical ideas to consider in setting
rules/standards; these may or may not apply to a particular situation:
minimize or eliminate use of global variables.
use descriptive function and method names - use both upper and lower
case, avoid abbreviations, use as many characters as necessary to be
adequately descriptive (use of more than 20 characters is not out of line);
be consistent in naming conventions.
use descriptive variable names - use both upper and lower case, avoid
abbreviations, use as many characters as necessary to be adequately
descriptive (use of more than 20 characters is not out of line); be
consistent in naming conventions.
function and method sizes should be minimized; less than 100 lines of
code is good, less than 50 lines is preferable.
function descriptions should be clearly spelled out in comments preceding
a function's code.
organize code for readability.
120
in adding comments, err on the side of too many rather than too few
comments; a common rule of thumb is that there should be at least as
many lines of comments (including header blocks) as lines of code.
no matter how small, an application should include documentaion of the
overall program function and flow (even a few paragraphs is better than
nothing); or if possible a separate flow chart and detailed program
documentation.
make extensive use of error handling procedures and status and error
logging.
for C++, to minimize complexity and increase maintainability, avoid too
many levels of inheritance in class heirarchies (relative to the size and
complexity of the application). Minimize use of multiple inheritance, and
minimize use of operator overloading (note that the Java programming
language eliminates multiple inheritance and operator overloading.)
for C++, keep class methods small, less than 50 lines of code per method
is preferable.
for C++, make liberal use of exception handlers
20. What is 'good design'?
'Design' could refer to many things, but often refers to 'functional design' or
'internal design'. Good internal design is indicated by software code whose
overall structure is clear, understandable, easily modifiable, and maintainable; is
robust with sufficient error-handling and status logging capability; and works
correctly when implemented. Good functional design is indicated by an
application whose functionality can be traced back to customer and end-user
requirements. For programs that have a user interface, it's often a good idea to
assume that the end user will have little computer knowledge and may not read a
user manual or even the on-line help; some common rules-of-thumb include:
the program should act in a way that least surprises the user
it should always be evident to the user what can be done next and how
to exit
the program shouldn't let the users do something stupid without
warning them.
21. What is SEI? CMM? ISO? IEEE? ANSI? Will it help?
SEI = 'Software Engineering Institute' at Carnegie-Mellon University;
initiated by the U.S. Defense Department to help improve software
development processes.
CMM = 'Capability Maturity Model', developed by the SEI. It's a model
of 5 levels of organizational 'maturity' that determine effectiveness in
delivering quality software. It is geared to large organizations such as
large U.S. Defense Department contractors. However, many of the QA
121
processes involved are appropriate to any organization, and if
reasonably applied can be helpful. Organizations can receive CMM
ratings by undergoing assessments by qualified auditors.
Level 1 - characterized by chaos, periodic panics, and heroic
efforts required by individuals to successfully
complete projects. Few if any processes in place;
successes may not be repeatable.
122
required. Note that ISO certification does not necessarily indicate quality
products - it indicates only that documented processes are followed.
IEEE = 'Institute of Electrical and Electronics Engineers' - among other
things, creates standards such as 'IEEE Standard for Software Test
Documentation' (IEEE/ANSI Standard 829), 'IEEE Standard of Software
Unit Testing (IEEE/ANSI Standard 1008), 'IEEE Standard for Software
Quality Assurance Plans' (IEEE/ANSI Standard 730), and others.
ANSI = 'American National Standards Institute', the primary industrial
standards body in the U.S.; publishes some software-related standards in
conjunction with the IEEE and ASQ (American Society for Quality).
Other software development process assessment methods besides CMM
and ISO 9000 include SPICE, Trillium, TickIT. and Bootstrap.
22. What is the 'software life cycle'?
The life cycle begins when an application is first conceived and ends when it is
no longer in use. It includes aspects such as initial concept, requirements
analysis, functional design, internal design, documentation planning, test
planning, coding, document preparation, integration, testing, maintenance,
updates, retesting, phase-out, and other aspects.
web test tools - to check that links are valid, HTML code
usage is correct, client-side and
server-side programs work, a web site's
interactions are secure.
A good test engineer has a 'test to break' attitude, an ability to take the point of
view of the customer, a strong desire for quality, and an attention to detail. Tact
and diplomacy are useful in maintaining a cooperative relationship with
developers, and an ability to communicate with both technical (developers) and
non-technical (customers, management) people is useful. Previous software
development experience can be helpful as it provides a deeper understanding of
the software development process, gives the tester an appreciation for the
developers' point of view, and reduce the learning curve in automated test tool
programming. Judgment skills are needed to assess high-risk areas of an
application on which to focus testing efforts when time is limited.
The same qualities a good tester has are useful for a QA engineer. Additionally,
they must be able to understand the entire software development process and
how it can fit into the business approach and goals of the organization.
Communication skills and the ability to understand various sides of issues are
important. In organizations in the early stages of implementing QA processes,
patience and diplomacy are especially needed. An ability to find problems as well
as to see 'what's missing' is important for inspections and reviews.
124
be able to maintain enthusiasm of their team and promote a positive
atmosphere, despite what is a somewhat 'negative' process (e.g.,
looking for or preventing problems)
be able to promote teamwork to increase productivity
125
Organizations vary considerably in their handling of requirements specifications.
Ideally, the requirements are spelled out in a document with statements such as
'The product shall.....'. 'Design' specifications should not be confused with
'requirements'; design specifications should be traceable back to the
requirements.
In some organizations requirements may end up in high level project plans,
functional specification documents, in design documents, or in other documents
at various levels of detail. No matter what they are called, some type of
documentation with detailed requirements will be needed by testers in order to
properly plan and execute tests. Without such documentation, there will be no
clear-cut way to determine if a software application is performing correctly.
29. What steps are needed to develop and run software tests?
The following are some of the steps to consider:
Obtain requirements, functional design, and internal design specifications
and other necessary documents
Obtain budget and schedule requirements
126
Prepare test environment and testware, obtain needed user
manuals/reference documents/configuration guides/installation guides, set
up test tracking processes, set up logging and archiving processes, set up
or obtain test input data
Obtain and install software releases
Perform tests
Retest as needed
Maintain and update test plans, test cases, test environment, and testware
through life cycle
30. What's a 'test plan'?
A software project test plan is a document that describes the objectives, scope,
approach, and focus of a software testing effort. The process of preparing a test
plan is a useful way to think through the efforts needed to validate the
acceptability of a software product. The completed document will help people
outside the test group understand the 'why' and 'how' of product validation. It
should be thorough enough to be useful but not so thorough that no one outside
the test group will read it. The following are some of the items that might be
included in a test plan, depending on the particular project:
Title
Table of Contents
Traceability requirements
127
Overall software project organization and personnel/contact-
info/responsibilties
Test organization and personnel/contact-info/responsibilities
Software CM processes
128
Software entrance and exit criteria
Personnel allocation
Test site/location
Open issues
129
The function, module, feature, object, screen, etc. where the bug
occurred
Environment specifics, system, platform, relevant hardware specifics
Tester name
Test date
Description of fix
Date of fix
Retest date
Retest results
The best bet in this situation is for the testers to go through the process of
reporting whatever bugs or blocking-type problems initially show up, with the
focus being on critical bugs. Since this type of problem can severely affect
schedules, and indicates deeper problems in the software development process
(such as insufficient unit testing or insufficient integration testing, poor design,
improper build or release procedures, etc.) managers should be notified, and
provided with some documentation as evidence of the problem.
132
The project's initial schedule should allow for some extra time
commensurate with the possibility of changes.
Try to move new requirements to a 'Phase 2' version of an application,
while using the original requirements for the 'Phase 1' version.
Negotiate to allow only easily-implemented new requirements into the
project, while moving more difficult new requirements into future versions
of the application.
Be sure that customers and management understand the scheduling
impacts, inherent risks, and costs of significant requirements changes.
Then let management or the customers (not the developers or testers)
decide if the changes are warranted - after all, that's their job.
Balance the effort put into setting up automated testing with the expected
effort required to re-do them to deal with changes.
Try to design some flexibility into automated test scripts.
Focus initial automated testing on application aspects that are most likely
to remain unchanged.
Devote appropriate effort to risk analysis of changes to minimize
regression testing needs.
Design some flexibility into test cases (this is not easily done; the best bet
might be to minimize the detail in the test cases, or set up only higher-
level generic-type test plans)
Focus less on detailed test plans and test cases and more on ad hoc
testing (with an understanding of the added risk that this entails).
38. What if the project isn't big enough to justify extensive testing?
Consider the impact of project errors, not the size of the project. However, if
extensive testing is still not justified, risk analysis is again needed and the same
considerations as described previously in 'What if there isn't enough time for
thorough testing?' apply. The tester might then do ad hoc testing, or write up a
limited test plan based on the risk analysis.
133
significant added risks as a result of the unexpected functionality. If the
functionality only effects areas such as minor improvements in the user interface,
for example, it may not be a significant risk.
Web sites are essentially client/server applications - with web servers and
'browser' clients. Consideration should be given to the interactions between html
pages, TCP/IP communications, Internet connections, firewalls, applications that
run in web pages (such as applets, javascript, plug-in applications), and
applications that run on the server side (such as cgi scripts, database interfaces,
logging applications, dynamic page generators, asp, etc.). Additionally, there are
134
a wide variety of servers and browsers, various versions of each, small but
sometimes significant differences between them, variations in connection
speeds, rapidly changing technologies, and multiple standards and protocols.
The end result is that testing for web sites can become a major ongoing effort.
Other considerations might include:
What are the expected loads on the server (e.g., number of hits per unit
time?), and what kind of performance is required under such loads (such
as web server response time, database query response times). What
kinds of tools will be needed for performance testing (such as web load
testing tools, other tools already in house that can be adapted, web robot
downloading tools, etc.)?
Who is the target audience? What kind of browsers will they be using?
What kind of connection speeds will they by using? Are they intra-
organization (thus with likely high connection speeds and similar
browsers) or Internet-wide (thus with a wide variety of connection speeds
and browser types)?
What kind of performance is expected on the client side (e.g., how fast
should pages appear, how fast should animations, applets, etc. load and
run)?
Will down time for server and content maintenance/upgrades be allowed?
how much?
What kinds of security (firewalls, encryptions, passwords, etc.) will be
required and what is it expected to do? How can it be tested?
How reliable are the site's Internet connections required to be? And how
does that affect backup system or redundant connection requirements and
testing?
What processes will be required to manage updates to the web site's
content, and what are the requirements for maintaining, tracking, and
controlling page content, graphics, links, etc.?
Which HTML specification will be adhered to? How strictly? What
variations will be allowed for targeted browsers?
Will there be any standards or requirements for page appearance and/or
graphics throughout a site or parts of a site??
How will internal and external links be validated and updated? how often?
135
How extensive or customized are the server logging and reporting
requirements; are they considered an integral part of the system and do
they require testing?
How are cgi programs, applets, javascripts, ActiveX components, etc. to
be maintained, tracked, controlled, and tested?
Pages should be 3-5 screens max unless content is tightly focused on a
single topic. If larger, provide internal links within the page.
The page layouts and design elements should be consistent throughout a
site, so that it's clear to the user that they're still within a site.
Pages should be as browser-independent as possible, or pages should be
provided or generated based on the browser-type.
All pages should have links external to the page; there should be no dead-
end pages.
The page owner, revision date, and a link to a contact person or
organization should be included on each page.
44. How is testing affected by object-oriented designs?
Well-engineered object-oriented design can make it easier to trace from code to
internal design to functional design to requirements. While there will be little
affect on black box testing (where an understanding of the internal design of the
application is unnecessary), white-box testing can be oriented to the application's
objects. If the application was well-designed this can simplify test design.
45. What is Extreme Programming and what's it got to do with testing?
Extreme Programming (XP) is a software development approach for small teams
on risk-prone projects with unstable requirements. It was created by Kent Beck
who described the approach in his book 'Extreme Programming Explained'.
Testing ('extreme testing') is a core aspect of Extreme Programming.
Programmers are expected to write unit and functional test code first - before the
application is developed. Test code is under source control along with the rest of
the code. Customers are expected to be an integral part of the project team and
to help develope scenarios for acceptance/black box testing. Acceptance tests
are preferably automated, and are modified and rerun for each of the frequent
development iterations. QA and test personnel are also required to be an integral
part of the project team. Detailed requirements documentation is not used, and
frequent re-scheduling, re-estimating, and re-prioritizing is expected.
46. Common Software Errors
Introduction
This document takes you through whirl-wind tour of common software errors.
This is an excellent aid for software testing. It helps you to identify errors
systematically and increases the efficiency of software testing and improves
136
testing productivity. For more information, please refer Testing Computer
Software, Wiley Edition.
Type of Errors
Error Handling
Calculation errors
Race Conditions
Load Conditions
Hardware
Testing Errors
Functionality
Sl No Possible Error Conditions
1 Excessive Functionality
2 Inflated impression of functionality
3 Inadequacy for the task at hand
4 Missing function
5 Wrong function
6 Functionality must be created by user
7 Doesn't do what the user expects
Communication
Missing Information
Sl No Possible Error Conditions
1 No on Screen instructions
2 Assuming printed documentation is already available.
3 Undocumented features
137
4 States that appear impossible to exit
5 No cursor
6 Failure to acknowledge input
7 Failure to show activity during long delays
8 Failure to advise when a change will take effect
9 Failure to check for the same document being opened twice
Wrong, misleading, confusing information
10 Simple factual errors
11 Spelling errors
12 Inaccurate simplifications
13 Invalid metaphors
14 Confusing feature names
15 More than one name for the same feature
16 Information overland
17 When are data saved
18 Wrong function
19 Functionality must be created by user
20 Poor external modularity
Help text and error messages
21 Inappropriate reading levels
22 Verbosity
23 Inappropriate emotional tone
24 Factual errors
25 Context errors
26 Failure to identify the source of error
27 Forbidding a resource without saying why
28 Reporting non-errors
29 Failure to highlight the part of the screen
30 Failure to clear highlighting
31 Wrong/partial string displayed
32 Message displayed for too long or not long enough
Display Layout
33 Poor aesthetics in screen layout
34 Menu Layout errors
35 Dialog box layout errors
36 Obscured Instructions
37 Misuse of flash
38 Misuse of color
39 Heavy reliance on color
40 Inconsistent with the style of the environment
41 Cannot get rid of on screen information
Output
42 Can't output certain data
43 Can't redirect output
44 Format incompatible with a follow-up process
45 Must output too little or too much
46 Can't control output layout
138
47 Absurd printout level of precision
48 Can't control labeling of tables or figures
49 Can't control scaling of graphs
Performance
50 Program Speed
51 User Throughput
52 Can't redirect output
53 Perceived performance
54 Slow program
55 slow echoing
56 how to reduce user throughput
57 Poor responsiveness
58 No type ahead
59 No warning that the operation takes long time
60 No progress reports
61 Problems with time-outs
62 Program pesters you
Program Rigidity
User tailorability
Sl No Possible Error Conditions
1 Can't turn off case sensitivity
2 Can't tailor to hardware at hand
3 Can't change device initialization
4 Can't turn off automatic changes
5 Can't slow down/speed up scrolling
6 Can't do what you did last time
7 Failure to execute a customization commands
8 Failure to save customization commands
9 Side effects of feature changes
10 Can't turn off the noise
11 Infinite tailorability
Who is in control?
12 Unnecessary imposition of a conceptual style
13 Novice friendly, experienced hostile
14 Surplus or redundant information required
15 Unnecessary repetition of steps
16 Unnecessary limits
139
6 Inconsistent command options
7 Similarly named commands
8 Inconsistent Capitalization
9 Inconsistent menu position
10 Inconsistent function key usage
11 Inconsistent error handling rules
12 Inconsistent editing rules
13 Inconsistent data saving rules
Time Wasters
14 Garden paths
15 choice can't be taken
16 Are you really, really sure
17 Obscurely or idiosyncratically named commands
Menus
18 Excessively complex menu hierarchy
19 Inadequate menu navigation options
20 Too many paths to the same place
21 You can't get there from here
22 Related commands relegated to unrelated menus
23 Unrelated commands tossed under the same menu
Command Lines
24 Forced distinction between uppercase and lowercase
25 Reversed parameters
26 Full command names are not allowed
27 Abbreviations are not allowed
28 Demands complex input on one line
29 no batch input
30 can't edit commands
Inappropriate use of key board
31 Failure to use cursor, edit, or function keys
32 Non std use of cursor and edit keys
33 non-standard use of function keys
34 Failure to filter invalid keys
35 Failure to indicate key board state changes
Missing Commands
State transitions
Sl No Possible Error Conditions
1 Can't do nothing and leave
2 Can't quit mid-program
3 Can't stop mid-command
4 Can't pause
Disaster prevention
5 No backup facility
6 No undo
7 No are you sure
8 No incremental saves
140
Disaster prevention
9 Inconsistent menu position
10 Inconsistent function key usage
11 Inconsistent error handling rules
12 Inconsistent editing rules
13 Inconsistent data saving rules
Error handling by the user
14 No user specifiable filters
15 Awkward error correction
16 Can't include comments
17 Can't display relationships between variables
Miscellaneous
18 Inadequate privacy or security
19 Obsession with security
20 Can't hide menus
21 Doesn't support standard OS features
22 Doesn't allow long names
Error Handling
Error prevention
Sl No Possible Error Conditions
1 Inadequate initial state validation
2 Inadequate tests of user input
3 Inadequate protection against corrupted data
4 Inadequate tests of passed parameters
5 Inadequate protection against operating system bugs
6 Inadequate protection against malicious use
7 Inadequate version control
Error Detection
Sl No Possible Error Conditions
1 ignores overflow
2 ignores impossible values
3 ignores implausible values
4 ignores error flag
5 ignores hardware fault or error conditions
6 data comparison
Error Recovery
Sl No Possible Error Conditions
1 automatic error detection
2 failure to report error
3 failure to set an error flag
4 where does the program go back to
5 aborting errors
6 recovery from hardware problems
141
7 no escape from missing disks
Calculation Errors
Race Conditions
Program Stops
143
Sl No Possible Error Conditions
1 Dead crash
2 Syntax error reported at run time
3 Waiting for impossible condition or combinations of conditions
4 Wrong user or process priority
Error Detection
Sl No Possible Error Conditions
1 infinite loop
2 Wrong starting value for the loop control variables
3 Accidental change of loop control variables
4 Command that do or don't belong inside the loop
5 Command that do or don't belong inside the loop
6 Improper loop nesting
Multiple Cases
Sl No Possible Error Conditions
1 Missing default
2 Wrong default
3 Missing cases
4 Overlapping cases
5 Invalid or impossible cases
6 Commands being inside the THEN or ELSE clause
7 Case should be sub-divided
Data boundaries
Sl No Possible Error Conditions
1 Un-terminated null strings
2 Early end of string
3 Read/Write past end of data structure or an element in it
Messaging Problems
Sl No Possible Error Conditions
1 Messages sent to wrong process or port
2 Failure to validate an incoming message
3 Lost or out of synch messages
4 Message sent to only N of N+1 processes
Load Conditions
145
5 Lost Messages
6 Performance costs
7 Race condition windows expand
8 Doesn't abbreviate under load
9 Doesn't recognize that another process abbreviates output under load
10 Low priority tasks not put off
11 Low priority tasks never done
Executive Summary
Producing a test specification, including the design of test cases, is the level of
test design which has the highest degree of creative input. Furthermore, unit test
specifications will usually be produced by a large number of staff with a wide
range of experience, not just a few experts.
This paper provides a general process for developing unit test specifications and
then describes some specific design techniques for designing unit test cases. It
147
serves as a tutorial for developers who are new to formal testing of software, and
as a reminder of some finer points for experienced software testers.
A. Introduction
The design of tests is subject to the same basic engineering principles as the
design of software. Good design consists of a number of stages which
progressively elaborate the design. Good test design consists of a number of
stages which progressively elaborate the design of tests:
Test strategy;
Test planning;
Test specification;
Test procedure.
These four stages of test design apply to all levels of testing, from unit testing
through to system testing. This paper concentrates on the specification of unit
tests; i.e. the design of individual unit test cases within unit test specifications. A
more detailed description of the four stages of test design can be found in the IPL
paper "An Introduction to Software Testing".
The design of tests has to be driven by the specification of the software. For unit
testing, tests are designed to verify that an individual unit implements all design
decisions made in the unit's design specification. A thorough unit test
specification should include positive testing, that the unit does what it is
supposed to do, and also negative testing, that the unit does not do anything that
it is not supposed to do.
Producing a test specification, including the design of test cases, is the level of
test design which has the highest degree of creative input. Furthermore, unit test
specifications will usually be produced by a large number of staff with a wide
range of experience, not just a few experts.
This paper provides a general process for developing unit test specifications, and
then describes some specific design techniques for designing unit test cases. It
serves as a tutorial for developers who are new to formal testing of software, and
as a reminder of some finer points for experienced software testers.
Once a unit has been designed, the next development step is to design the unit
tests. An important point here is that it is more rigorous to design the tests before
the code is written. If the code was written first, it would be too tempting to test
the software against what it is observed to do (which is not really testing at all),
rather than against what it is specified to do.
A unit test specification comprises a sequence of unit test cases. Each unit test
case should include four essential elements:
148
A statement of the initial state of the unit, the starting point of the test case
(this is only applicable where a unit maintains state between calls);
The inputs to the unit, including the value of any external data read by the
unit;
What the test case actually tests, in terms of the functionality of the unit
and the analysis used in the design of the test case (for example, which
decisions within the unit are tested);
The expected outcome of the test case (the expected outcome of a test
case should always be defined in the test specification, prior to test
execution).
The following subsections of this paper provide a six step general process for
developing a unit test specification as a set of individual unit test cases. For each
step of the process, suitable test case design techniques are suggested. (Note
that these are only suggestions. Individual circumstances may be better served
by other test case design techniques). Section 3 of this paper then describes in
detail a selection of techniques which can be used within this process to help
design test cases.
The purpose of the first test case in any unit test specification should be to
execute the unit under test in the simplest way possible. When the tests are
actually executed, knowing that at least the first unit test will execute is a good
confidence boost. If it will not execute, then it is preferable to have something as
simple as possible as a starting point for debugging.
Suitable techniques:
Test cases should be designed to show that the unit under test does what it is
supposed to do. The test designer should walk through the relevant
specifications; each test case should test one or more statements of
specification. Where more than one specification is involved, it is best to make
the sequence of test cases correspond to the sequence of statements in the
primary specification for the unit.
Suitable techniques:
149
Existing test cases should be enhanced and further test cases should be
designed to show that the software does not do anything that it is not specified to
do. This step depends primarily upon error guessing, relying upon the experience
of the test designer to anticipate problem areas.
Suitable techniques:
- Error guessing
- Boundary value analysis
- Internal boundary value testing
- State-transition testing
Suitable techniques:
The test coverage likely to be achieved by the designed test cases should be
visualised. Further test cases can then be added to the unit test specification to
achieve specific test coverage objectives. Once coverage tests have been
designed, the test procedure can be developed and the tests executed.
Suitable techniques:
- Branch testing
- Condition testing
- Data definition-use testing
- State-transition testing
A test specification designed using the above five steps should in most cases
provide a thorough test for a unit. At this point the test specification can be used
to develop an actual test procedure, and the test procedure used to execute the
150
tests. For users of AdaTEST or Cantata, the test procedure will be an AdaTEST
or Cantata test script.
Execution of the test procedure will identify errors in the unit which can be
corrected and the unit re-tested. Dynamic analysis during execution of the test
procedure will yield a measure of test coverage, indicating whether coverage
objectives have been achieved. There is therefore a further coverage completion
step in the process of designing test specifications.
Suitable techniques:
- Branch testing
- Condition testing
- Data definition-use testing
- State-transition testing
Note that the first five steps in producing a test specification can be achieved:
151
Solely from design documentation;
Without looking at the actual code;
Prior to developing the actual test procedure.
It is usually a good idea to avoid long sequences of test cases which depend
upon the outcome of preceding test cases. An error identified by a test case early
in the sequence could cause secondary errors and reduce the amount of real
testing achieved when the tests are executed.
Throughout unit test design, the primary input should be the specification
documents for the unit under test. While use of actual code as an input to the test
design process may be necessary in some circumstances, test designers must
take care that they are not testing the code against itself. A test specification
developed from the code will only prove that the code does what the code does,
not that it does what it is supposed to do.
The preceding section of this paper has provided a "recipe" for developing a unit
test specification as a set of individual test cases. In this section a range of
techniques which can be to help define test cases are described.
Test case design techniques can be broadly split into two main categories. Black
box techniques use the interface to a unit and a description of functionality, but
do not need to know how the inside of a unit is built. White box techniques make
use of information about how the inside of a unit works. There are also some
other techniques which do not fit into either of the above categories. Error
guessing falls into this category.
152
The most important ingredients of any test design are experience and common
sense. Test designers should not let any of the given techniques obstruct the
application of experience and common sense.
As the name suggests, test cases are designed by walking through the relevant
specifications. Each test case should test one or more statements of
specification. It is often practical to make the sequence of test cases correspond
to the sequence of statements in the specification for the unit under test. For
example, consider the specification for a function to calculate the square root of a
real number, shown in figure 3.1.
153
There are three statements in this specification, which can be addressed by two
test cases. Note that the use of Print_Line conveys structural information in the
specification.
Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative
input" using Print_Line.
154
A variation of specification derived testing is to apply a similar technique to a
security analysis, safety analysis, software hazard analysis, or other document
which provides supplementary information to the unit's specification.
Equivalence partitioning assumes that all values within any individual partition
are equivalent for test purposes. Test cases should therefore be designed to test
one value in each partition. Consider again the square root function used in the
previous example. The square root function has two input partitions and two
output partitions, as shown in table 3.2.
Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative
input" using Print_Line.
For a function like square root, we can see that equivalence partitioning is quite
simple. One test case for a positive number and a real result; and a second test
case for a negative number and an error result. However, as software becomes
more complex, the identification of partitions and the inter-dependencies between
partitions becomes much more difficult, making it less convenient to use this
technique to design test cases. Equivalence partitioning is still basically a positive
test case design technique and needs to be supplemented by negative tests.
155
C.3. Boundary Value Analysis
The zero or greater partition has a boundary at 0 and a boundary at the most
positive real number. The less than zero partition shares the boundary at 0 and
has another boundary at the most negative real number. The output has a
boundary at 0, below which it cannot go.
Test Case 1: Input {the most negative real number}, Return 0, Output "Square
root error - illegal negative input" using Print_Line
Test Case 2: Input {just less than 0}, Return 0, Output "Square root error - illegal
negative input" using Print_Line
Test Case 4: Input {just greater than 0}, Return {the positive square root of the
input}
156
- Exercises just inside the lower boundary of partition (ii).
Test Case 5: Input {the most positive real number}, Return {the positive square
root of the input}
- Exercises the upper boundary of partition (ii) and the upper boundary of
partition (a).
State transition testing is particularly useful where either the software has been
designed as a state machine or the software implements a requirement that has
been modelled as a state machine. Test cases are designed to test the
transitions between states by creating the events which lead to transitions.
When used with illegal combinations of states and events, test cases for negative
testing can be designed using this approach. Testing state machines is
addressed in detail by the IPL paper "Testing State Machines with AdaTEST and
Cantata".
C.5. Branch Testing
In branch testing, test cases are designed to exercise control flow branches or
decision points in a unit. This is usually aimed at achieving a target level of
Decision Coverage. Given a functional specification for a unit, a "black box" form
of branch testing is to "guess" where branches may be coded and to design test
cases to follow the branches. However, branch testing is really a "white box" or
structural test case design technique. Given a structural specification for a unit,
specifying the control flow within the unit, test cases can be designed to exercise
branches. Such a structural unit specification will typically include a flowchart or
PDL.
Returning to the square root example, a test designer could assume that there
would be a branch between the processing of valid and invalid inputs, leading to
the following test cases:
Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative
input" using Print_Line.
157
- Exercises the invalid input processing branch
158
It can be seen that branch testing works best with a structural specification for
the unit. A structural unit specification will enable branch test cases to be
designed to achieve decision coverage, but a purely functional unit specification
could lead to coverage gaps.
There are a range of test case design techniques which fall under the general
title of condition testing, all of which try to allay the weaknesses of branch testing
when complex logical conditions are encountered. The object of condition testing
is to design test cases to show that the individual components of logical
conditions and combinations of the individual components are correct.
Test cases are designed to test the individual elements of logical expressions,
both within branch conditions and within other expressions in a unit. As for
branch testing, condition testing could be used as a "black box" technique, where
the test designer makes intelligent guesses about the implementation of a
159
functional specification for a unit. However, condition testing is more suited to
"white box" test design from a structural specification for a unit.
The test cases should be targeted at achieving a condition coverage metric, such
as Modified Condition Decision Coverage (available as Boolean Operand
Effectiveness in AdaTEST). The IPL paper entitled "Structural Coverage Metrics"
provides more detail of condition coverage metrics.
To illustrate condition testing, consider the example specification for the square
root function which uses successive approximation (figure 3.3(d) - Specification
4). Suppose that the designer for the unit made a decision to limit the algorithm
to a maximum of 10 iterations, on the grounds that after 10 iterations the answer
would be as close as it would ever get. The PDL specification for the unit could
specify an exit condition like that given in figure 3.4.
Test Case 2: 2 iterations, error>=desired accuracy for the first iteration, and
error<desired accuracy for the second iteration.
- Both parts of the condition are false for the first iteration.
On the second iteration, the first part of the condition
becomes true and the second part remains false, showing
that the error<desired accuracy part of the condition can
160
independently affect its outcome.
Condition testing works best when a structural specification for the unit is
available. It provides a thorough test of complex conditions, an area of frequent
programming and design error and an area which is not addressed by branch
testing. As for branch testing, it is important for test designers to beware that
concentrating on conditions could distract a test designer from the overall
functionality of a unit.
Data definition-use testing designs test cases to test pairs of data definitions and
uses. A data definition is anywhere that the value of a data item is set, and a data
use is anywhere that a data item is read or used. The objective is to create test
cases which will drive execution through paths between specific definitions and
uses.
Like decision testing and condition testing, data definition-use testing can be
used in combination with a functional specification for a unit, but is better suited
to use with a structural specification for a unit.
Consider one of the earlier PDL specifications for the square root function which
sent every input to the maths co-processor and used the co-processor status to
determine the validity of the result. (Figure 3.3(c) - Specification 3). The first step
is to list the pairs of definitions and uses. In this specification there are a number
of definition-use pairs, as shown in table 3.3.
These pairs of definitions and uses can then be used to design test cases. Two
test cases are required to test all six of these definition-use pairs:
161
Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative
input" using Print_Line.
The analysis needed to develop test cases using this design technique can also
be useful for identifying problems before the tests are even executed; for
example, identification of situations where data is used without having been
defined. This is the sort of data flow analysis that some static analysis tool can
help with. The analysis of data definition-use pairs can become very complex,
even for relatively simple units. Consider what the definition-use pairs would be
for the successive approximation version of square root!
It is possible to split data definition-use tests into two categories: uses which
affect control flow (predicate uses) and uses which are purely computational.
Refer to "Software Testing Techniques" 2nd Edition, B Beizer,Van Nostrand
Reinhold, New York 1990, for a more detailed description of predicate and
computational uses.
In many cases, partitions and their boundaries can be identified from a functional
specification for a unit, as described under equivalence partitioning and boundary
value analysis above. However, a unit may also have internal boundary values
which can only be identified from a structural specification. Consider a fragment
of the successive approximation version of the square root unit specification, as
shown in figure 3.5 ( derived from figure 3.3(d) - Specification 4).
The calculated error can be in one of two partitions about the desired accuracy, a
feature of the structural design for the unit which is not apparent from a purely
functional specification. An analysis of internal boundary values yields three
conditions for which test cases need to be designed.
162
Test Case 1: Error just greater than the desired accuracy
Test Case 2: Error equal to the desired accuracy
Test Case 3: Error just less than the desired accuracy
Internal boundary value testing can help to bring out some elusive bugs. For
example, suppose "<=" had been coded instead of the specified "<".
Nevertheless, internal boundary value testing is a luxury to be applied only as a
final supplement to other test case design techniques.
Error guessing is based mostly upon experience, with some assistance from
other techniques such as boundary value analysis. Based on experience, the test
designer guesses the types of errors that could occur in a particular type of
software and designs test cases to uncover them. For example, if any type of
resource is allocated dynamically, a good place to look for errors is in the
deallocation of resources. Are all resources correctly deallocated, or are some
lost as the software executes?
To make the maximum use of available experience and to add some structure to
this test case design technique, it is a good idea to build a check list of types of
errors. This check list can then be used to help "guess" where errors may occur
within a unit. The check list should be maintained with the benefit of experience
gained in earlier unit tests, helping to improve the overall effectiveness of error
guessing.
D. Conclusion
Experience has shown that a conscientious approach to unit testing will detect
many bugs at a stage of the software development where they can be corrected
economically. A rigorous approach to unit testing requires:
The process for developing unit test specifications presented in this paper is
generic, in that it can be applied to any level of testing. Nevertheless, there will
be circumstances where it has to be tailored to specific situations. Tailoring of the
163
process and the use of test case design techniques should be documented in the
overall test strategy.
2.1 Introduction
Other work on the evaluation of diagrams and graphs is also reviewed for
possible theoretical models that could be used in the current research. Human-
Computer Interaction (HCI) is an Information Systems area that has drawn
extensively on cognitive science to develop and evaluate Graphical User
Interfaces (GUIs). A brief overview of cognitive-based approaches utilized in HCI
is presented. One of these approaches, the Human Information Processing
System model, in which the human mind is treated as an information-processing
system, provides the cognitive theoretical model for this research and is
discussed separately because of its importance. Work on attention and the
comprehension of graphics is also briefly reviewed.
Two further areas are identified as necessary for the development of the
research task and tools: (1) types of diagrammatic models and (2) types of
software defects. Relevant work in each of these areas is briefly reviewed and,
since typologies appropriate to this research were not located, appropriate
typologies are developed.
Following Tjahjono [1996, 2], Formal Technical Review may be defined as any
"evaluation technique that involves the bringing together of a group of technical
[and sometimes non-technical] personnel to analyze a software artifact, typically
with the goal of discovering errors or other anomalies." As such, FTR has the
following distinguishing characteristics:
164
1. Formal process.
2. Use of groups or teams. Most FTR techniques involve real groups, but
nominal groups are used as well.
3. Review by knowledgeable individuals or practitioners.
4. Focus on detection of defects.
1.Desk Checking, or reading over a program by hand while sitting at one's desk,
is the oldest software review technique [Adrion et al. 1982]. Strictly speaking,
desk checking is not a form of FTR since it does not involve a formal process or
a group. Moreover, desk checking is generally perceived as ineffective and
unproductive due to (a) its lack of discipline and (b) the general ineffectiveness of
people in detecting their own errors. To correct for the second problem,
programmers often swap programs and check each other's work. Since desk
checking is an individual process not involving group dynamics, research in this
area would be relevant but none applicable to the current research was found.
It should be noted that Humphrey [1995] has developed a review method, called
Personal Review (PR), which is similar to desk checking. In PR, each
programmer examines his own products to find as many defects as possible
utilizing a disciplined process in conjunction with Humphrey's Personal Software
Process (PSP) to improve his own work. The review strategy includes the use of
checklists to guide the review process, review metrics to improve the process,
and defect causal analysis to prevent the same defects from recurring in the
future. The approach taken in developing the Personal Review process is an
engineering one; no reference is made in Humphrey [1995] to cognitive theory.
2. Peer Rating is a technique in which anonymous programs are evaluated in
terms of their overall quality, maintainability, extensibility, usability and clarity
by selected programmers who have similar backgrounds [Myers 1979].
Shneiderman [1980] suggests that peer ratings of programs are productive,
enjoyable, and non-threatening experiences. The technique is often referred
to as Peer Reviews [Shneiderman 1980], but some authors use the term peer
reviews for generic review methods involving peers [Paulk et al 1993;
Humphrey 1989].
165
advance preparation on the part of reviewers and with the meeting focus on
education of participants [Fagan 1976].
A. Small vs. Large Team Reviews. Siy [1996] classifies reviews into those
conducted by small (1-4 reviewers) [Bisant and Lyle 1996] and large (more
than 4 reviewers) [Fagan 1976, 1986] teams. If each reviewer depends on
different expertise and experiences, a large team should allow a wider
variety of defects to be detected and thus better coverage. However, a
large team requires more effort due to more individuals inspecting the
artifact, generally involves greater scheduling problems [Ballman and
Votta 1994], and may make it more difficult for all participants to
participate fully.
166
B. No vs. Single vs. Multiple Session Reviews. The traditional Fagan
Inspection provided for one session to inspect the software artifact, with
the possibility of a follow-up session to inspect corrections. However,
variants have been suggested.
On the other hand, some authors [Knight and Myers 1993; Schneider et
al. 1992] have argued for multiple sessions, conducted either in series or
parallel. Gilb and Graham [1993] do not use multiple inspection sessions
but add a root cause analysis session immediately after the inspection
meeting.
C. Nonsystematic vs. Systematic Defect-Detection Technique Reviews.
The most frequently used detection methods (ad hoc and checklist) rely on
nonsystematic techniques, and reviewer responsibilities are general and not
differentiated for single session reviews [Siy 1996]. However, some methods
employ more prescriptive techniques, such as questionnaires [Parnas and
Weiss 1987] and correctness proofs [Britcher 1988].
D.Single Site vs. Multiple Site Reviews. The traditional FTR techniques
have assumed that the group-meeting component would occur face-to-face at
a single site. However, with improved telecommunications, and especially
with computer support (see item F below), it has become increasingly feasible
to conduct even the group meeting from multiple sites.
E. Synchronous vs. Asynchronous Reviews. The traditional FTR
techniques have also assumed that the group meeting component would
occur in real-time; i.e., synchronously. However, some newer techniques
that eliminate the group meeting or are based on computer support utilize
asynchronous reviews.
F. Manual vs. Computer-supported Reviews. In recent years, several
computer supported review systems have been developed [Brothers et al.
1990; Johnson and Tjahjono 1993; Gintell et al. 1993; Mashayekhi et al
1994]. The type of support varies from simple augmentation of the manual
practices [Brothers et al. 1990; Gintell et al. 1993] to totally new review
methods [Johnson and Tjahjono 1993].
Wheeler et al. [1996], after reviewing a number of studies that support the
economic benefit of FTR, conclude that inspections reduce the number of defects
throughout development, cause defects to be found earlier in the development
process where they are less expensive to correct, and uncover defects that
would be difficult or impossible to discover by testing. They also note "these
167
benefits are not without their costs, however. Inspections require an investment
of approximately 15 percent of the total development cost early in the process [p.
11]."
In discussing overall economic effects, Wheeler et al. cite Fagan [1986] to the
effect that investment in inspections has been reported to yield a 25-to-35
percent overall increase in productivity. They also reproduce a graphical analysis
from Boehm [1987] that indicates inspections reduce total development cost by
approximately 30%.
The Wheeler et al. [1996] analysis does not specify the relative value of
Practitioner Evaluation to FTR, but two recent economic analyses provide
indications.
Siy [1996]. In his analysis of the factors driving inspection costs and benefits,
Siy reports that changes in FTR structural elements, such as group size,
number of sessions, and coordination of multiple sessions, were largely
ineffective in improving the effectiveness of inspections. Instead, inputs into
the process (reviewers and code units) accounted for more outcome variation
than structural factors. He concludes by stating "better techniques by which
reviewers detect defects, not better process structures, are the key to
improving inspection effectiveness [Abstract, p. 2]." (emphasis added)
Votta's analysis effectively attributes most of the economic benefit of FTR to PE,
and Siy's explicitly states that better PE techniques "are the key to improving
inspection effectiveness." These findings, if supported by additional research,
would further support the contention that a better understanding of Practitioner
Evaluation is necessary.
2.2.3 Psychological Aspects of FTR
Work on the psychological aspects of FTR can be categorized into four groups.
1.Egoless Programming. Gerald Weinberg [1971] began the examination of
psychological issues associated with software review in his work on egoless
programming. According to Weinberg, programmers are often reluctant to
allow their programs to be read by other programmers because the programs
are often considered to be an extension of the self and errors discovered in
168
the programs to be a challenge to one's self-image. Two implications of this
theory are as follows:
i. The ability of a programmer to find errors in his own work tends to be
impaired since he tends to justify his own actions, and it is therefore more
effective to have other people check his work.
ii. Each programmer should detach himself from his own work. The work
should be considered a public property where other people can freely
criticize, and thus, improve its quality; otherwise, one tends to become
defensive, and reluctant to expose one's own failures.
These two concepts have led to the justification of FTR groups, as well as the
establishment of independent quality assurance groups that specialize in
finding software defects in many software organizations [Humphrey 1989].
4.Group Process. Most FTR methods are implemented using small groups.
Therefore, several key issues from small group theory apply to FTR, such as
group think (tendency to suppress dissent in the interests of group harmony),
group deviants (influence by minority), and domination of the group by a single
member. Other key issues include social facilitation (presence of others boosts
one's performance) and social loafing (one member free rides on the group's
effort) [Myers 1990]. The issue of moderator domination in inspections is also
documented in the literature [Tjahjono 1996].
Perhaps the most interesting research from the perspective of the current
study is that of Sauer et al. [2000]. This research is unusual in that it has an
explicit theoretical basis and outlines a behaviorally motivated program of
research into the effectiveness of software development technical reviews.
The finding that most of the variation in effectiveness of software
development technical reviews is the result of variations in expertise among
the participants provides additional motivation for developing a solid
understanding of Formal Technical Review at the individual level.
It should be noted that all of this work, while based on psychological theory, does
not address the issue of how practitioners actually evaluate software artifacts.
2.3 Approaches to the Evaluation of Diagrammatic Models
The focus of this dissertation is the exploration of how practitioners as individuals
evaluate diagrammatic models for semantic errors that would cause the resulting
169
system not to meet the functionality, performance, security, usability,
maintainability, testability or other requirements necessary to the purposes of the
system [Bass et al. 1998; Boehm et al. 1978].
1. Computer Aided Design (CAD). Since CAD uses diagrams to specify the
design and construction of physical entities [Yoshikawa and Warman 1987], it
seemed reasonable to assume that techniques developed to evaluate CAD
diagrams might be adapted for the evaluation of diagrams used to specify
software systems. However, a review of the literature found relatively little
literature on the evaluation of CAD diagrams, and that which was found
pertained to the formal (i.e., "mathematical") evaluation of circuit designs.
Discussion with William Miller of the University of South Florida Engineering
faculty supported this conclusion [Miller 2000], and this approach was
abandoned.
2.Radiological Images. While x-rays are not technically diagrams and do not
specify a system, they are visual artifacts and do convey information. Therefore,
it was reasoned that rules for reading radiological images might provide insights
into the evaluation of software diagrammatic models. Review of the literature
found nothing appropriate. More importantly, as further conceptual work was
done regarding the purposes of evaluating software diagrammatic models, it
became apparent that the reading of x-rays was not an appropriate analog. This
approach was therefore also abandoned.
The language, concepts, and purposes of HCI are very similar to those of
information systems, and it is arguable that HCI is a part of information
systems. (See, for example, the Huber [1983] and Robey [1983] debate
on cognitive style and DSS design.)
HCI is solidly rooted in psychology, a traditional information systems
reference discipline.
Computer user-interfaces almost always have a visual component and are
increasingly diagrammatic in design.
User-interfaces can be and are evaluated in terms of the semantic error
criteria described above; i.e., defects in functionality, performance,
efficiency, etc.
170
Based on these facts, a decision was made to attempt to identify an HCI
evaluation technique that could be adapted for evaluation of software
diagrammatic models.
171
8. Feature Inspections. In feature inspections the focus is on the functionality
provided by the software system being inspected; i.e., whether the function as
designed meets the needs of the intended end users.
These HCI evaluation techniques are clearly similar to FTR in that they involve
the use of knowledgeable individuals to detect defects in a software artifact; most
also involve a formal process and a group.
Human Factors/Actors. Bannon [1991, 28] argues that the term human
factors should be replaced with the term human actors to indicate "emphasis
is placed on the person as an autonomous agent that has the capacity to
regulate and coordinate his or her behavior, rather than being a simple
passive element in a human-machine system." The change is supposed to
facilitate focusing on the way people act in real work settings instead of
viewing them as information processors.
Distributed Cognition. An emerging theoretical framework is distributed
cognition. The goal of distributed cognition is to conceptualize cognitive
activities as embodied and situated within the work context in which they
occur [Hutchins 1990; Hutchins and Klausen 1992].
172
The human factors/actors and distributed cognition models are not
appropriate to the current study. The connectionist models show great promise
but are not yet sufficiently developed to be useful for this research. The
information processor models are however appropriate and sufficiently mature;
they provide the primary cognitive theoretical base for the dissertation.
Computational approaches are also utilized in that the study analyzes the
cognitive system in terms of the task planning involved in task performance.
173
Figure 2.2 Extended Stages of the Information Processing Model (adapted
from Barber [1988])
HIPS models, such as Anderson's ACT-R [1993], continue to be developed and
are useful. Further, the information processing approach has recently been
described as the primary metatheory of cognitive psychology [Ashcraft 1994].
2.4.2 Coping with Attention as a Limited Resource
One of the earliest psychological definitions of attention is that of William James
[1890, vol. 1, 403-404]:
Everyone knows what attention is. It is the taking possession of the
mind, in clear and vivid form, of one out of what seem several
simultaneously possible objects or trains of thought. Focalization,
concentration of consciousness are of its essence. It implies withdrawal
from some things in order to deal more effectively with others . . .
(emphasis added)
An example of this rethinking is the work of Broadbent [1952] and Cherry [1953].
They used a technique to study attention in which different spoken messages are
174
presented to a subject's two ears at the same time. Their research shows that
subjects are able to attend to one message if the messages are distinguished by
physical (rather than merely semantic) cues, but recall almost nothing of the
nonattended channel. In 1956, Miller reviewed a series of experiments that
utilized a different methodology and noted that, across many domains, subjects
could keep in mind no more than about seven "chunks" simultaneously. These
findings were among the first experimental evidence that attentional capacity is a
limited resource.
More recent experimental work continues to indicate that attention is a
limited resource [Cowan 1995]. Even those cognitive psychologists who
have recently challenged the very concept of attention assume their
"attention" analog is limited. One example of this would be Allport [1980] and
Wickens [1984], who argue that the concept of attention should be replaced
with the concept of multiple limited processing resources.
Based on an examination of the exhaustive review by Cowan [1995] of the
intersection of memory and attention, the Shiffrin [1988, 739] definition appears
to be representative of contemporary thought:
Since human cognitive resources are limited, cognitively complex tasks may
overload these resources and decrease the quality and/or quantity of outputs.
Various approaches to measuring the cognitive complexity of tasks have been
developed. In HCI, an informal view of complexity is often utilized. For example,
Grant [1990, sec. 1.3] defines a complex task as one for which there are a large
number of potential practical strategies. This definitions is not inconsistent with
the measure assumed by Simon [1962] in his paper on the use of hierarchical
decomposition to decrease the complexity of problemsolving.
Simon [1990] argues that humans develop mechanisms to enable them to deal
with complex, real-life situations despite their limited cognitive resources. One
such mechanism is task planning. According to Fredericksen and Breuleaux
[1990], task planning is a cognitive bargain in which the time and effort spent
working with an abstract, and therefore, smaller problem space during planning
minimizes actual work on the task in the original, detailed problem space.
Earley and Perry [1987, 279] define a task plan as "a cognitively based routine
for attaining a particular objective and consists of multiple steps." Newell and
Simon [1972] identify planning from verbal protocols as those passages in which:
175
Two further items should be noted regarding planning:
2. Planning is not complete before action. Both theory and analysis of verbal
protocols indicate that periods of planning are interleaved with action
[McDermott 1978; Newell and Simon 1972]. In other words, practitioners will
often plan a response to part of a task, complete some or all of the actions
specified in that plan, plan a new response incorporating information acquired
during prior action period(s), complete the new actions, etc.
In the HIPS model, the nature and amount of stimuli impact both information
processing and output. This research uses a key concept of the HIPS model,
attention, in two ways:
Larkin and Simon [1987] consider why diagrams can be superior to a verbal
description for solving problems, and suggest the following reasons:
Diagrams can group together all information that is used together, thus
avoiding large amounts of search for the elements needed to make a
problem-solving inference.
Diagrams typically use location to group information about a single element,
avoiding the need to match symbolic labels.
Diagrams automatically support a large number of perceptual inferences,
which are extremely easy for humans.
As noted in Chapter 1, two of these depend on spatial patterns.
Winn [1994] presents an overview of how the symbol system of graphics
interacts with the viewers' perceptual and cognitive processes, which is
summarized in figure 2.3. In his description, the graphical symbol system
consists of two elements: (1) Symbols that bear an unambiguous one-to-one
176
relationship to objects in the domain of reference, and (2) The spatial relations of
the symbols to each other. Thus, how symbols are configured spatially will affect
the way viewers understand how the associated objects are related and interact.
For the purposes of this dissertation, a particularly interesting finding is that
biases based on reading direction (left-to-right for English) affect the
interpretation of graphics.
177
2.6.1 Wieringa 1998
2. Data Model. Shows the data entities of an application and the relationships
between the entities. Entities and relationships can be selected in subsets to
produce views of the data model. The diagramming technique normally used
to depict graphically the data model is the Entity Relationship Diagram (ERD)
and the model is sometimes referred to as the Entity-Relationship Model.
3. Process Model. Shows how things occur in the organization via a sequence
of processes, actions, stores, inputs and outputs. Processes are decomposed
into more detail, producing a layered hierarchical structure. The diagramming
technique used for process modeling in structured analysis is the Data Flow
Diagram (DFD). Several notations are available for representing process
178
modeling, with the most widely used being Yourdon/DeMarco and Gane &
Sarson.
5. State Transition Model (Real Time Model). Shows how objects transition to
and from various states or conditions and the events or triggers that cause
them to change between the different states.
In evaluating these two typologies for this research, two problems were noted:
The first step in the development process was to consult several systems
analysis and design and structured techniques texts for classification insights and
to derive lists of commonly used diagrammatic models. These included Fertuck
[1995], Hoffer et al. [1998], Kendall and Kendall [1995], and Martin and McClure
[1985].
Martin and McClure make a major distinction between hierarchical diagrams (i.e.,
those having one overall node or root and which do not remerge) and mesh or
179
network diagrams (i.e., those not having a single overall node or root or which do
remerge). For the purposes of this research, this distinction is operationalized as
the categorical variable hierarchical/not hierarchical.
Martin and McClure also make a major distinction between diagrams showing
sequence and those that do not. Sequence usually implies temporal
directionality; for this dissertation, the distinction is broadened to include the
possibility of logical and other forms of directionality and is operationalized as the
categorical variable directional/not directional.
As a test of the feasibility of the classification scheme, twenty diagram types from
Martin and McClure, UML diagrams from Harmon and Watson [1998], and a
model of a "typical" GUI were then categorized. The results of this categorization
are shown in table 2.1.
DATA HYBRID PROCESS DATA HYBRID PROCESS DATA HYBRID PROCESS DATA HYBRID PROCESS
I II III IV V VI VII VIII IX X XI XII
Functional Functional Data Flow Data Typical
Decomposi- Decomposi- Analysis GUI
tion II tion I
180
HIPO UML
(Detail) Sequence
Nassi- UML
Shneiderman Activity
Charts
Action II Action I
Inspection of table 2.1 shows that only seven of the twelve (2 x 2 x 3) possible
categories are actually populated. Table 2.2 shows the categorization of the
diagram types after collapsing unpopulated categories.
181
HIPO HIPO Data Navigation Inverted-L UML Class
(Overview) (VTC)
Nassi-ShneidermanUML Activity
Charts
Action II Action I
182
and Bass et al. [1998] develop typologies of software qualities, and the definition
in Grady [1992, 122] of a defect as "any flaw in the specification, design, or
implementation of a product" inherently includes software qualities. Therefore,
the primary focus of the first section below is on typologies of software qualities.
The second section reviews other software defect typologies, and the third
section discusses the development of the typology used in this research.
183
Figure 2.4 Boehm et al. [1978] Software Quality Characteristics Tree
(adapted)
The Grady [1992] software defect model is shown below in figure 2.5. It is also a
hierarchical model (with the root at the bottom) that classifies defects according
184
to origin, type, and mode. Grady describes six types of software defects that
correspond to the five modes plus a residual "Other" category:
6. Other.
185
Figure 2.5 Grady [1992] Software Defect Model
Bass et al. [1998] discuss ten technical qualities of software, dividing them into
those that are discernible at runtime (DR) and those not discernible at runtime
(NDR). The following is a brief discussion of the software qualities in their
typology:
1. Functionality (DR) is the ability of the system to do the work for which it was
intended; it is the basic statement of the system's capabilities, services, and
behavior.
186
Bass et al. [1998, 79] note that "For most of the history of software
engineering, performance has been the driving factor in software architecture,
and this has frequently compromised the achievement of other qualities."
6. Maintainability (NDR). Bass et al. [1998] use the terms modifiability and
maintainability interchangeably and define modifiability as the ability of a
system to make changes quickly and cost effectively. According to them,
modifications to a system can be broadly categorized as follows:
10. Software testability (NDR) refers to the ease with which software can be
made to demonstrate its faults through (typically execution-based) testing.
This research uses Bass et al. [1998] as the basis for the qualities dimension of
the software defects typology.
187
2.7.2 Other Defect Dimensions
Review of the literature yields three other dimensions for the classification of
software defects.
2.7.2.1 Class
Class refers to whether the defect is the result of logic or other required
structure's being missing (M), incorrect (I), or extra (E) [Ebenau and Strauss
1994].
2.7.2.2 Severity
The defect severity categories generally listed are major (J), minor (N), and
(sometimes) trivial (T) [Ebenau and Strauss 1994; Gilb and Graham 1993; Kelly
et al. 1992].
2.7.2.3 Cause
Humphrey [1995], following Gale [1990], lists five categories of basic defect
causes:
188
Further simplification is achieved by ignoring extra functionality defects of the
class dimension. The rationale for this reduction is that, while defects associated
with extra functionality may increase storage requirements or otherwise decrease
efficiency, the impact on functionality is generally less severe than that caused by
missing and incorrect defects.
Change is also necessary on the qualities dimension. Six of the Bass et al.
[1998] qualities are not readily discernable from diagrammatic models and are
consequently not appropriate to the typology. However, according to Boehm et al.
[1978], the primitive quality Structuredness partially determines three of the six.
Similarly, Fenton and Neil [2001] lists Structuredness as an internal attribute
associated with the external attributes reliability (or availability), maintainability,
and reusability. The six non-discernable qualities are listed below. A B indicates a
Boehm quality; an F indicates a Fenton attribute.
Availability F
Maintainability B,F
Portability
Reusability B,F
Integrability
Testability B
During the early development of the research task, several subjects noted that
the scope of the diagrammatic models was not consistent. From a theoretical
perspective, lack of Scope Consistency is an instance of a general consistency
problem. In the structured approach to IS development, data and process models
are supposed to model the same system but are fundamentally separate. This
separateness leads to multiple problems including lack of consistency [Repa
2001]. Consideration was given to adding the broader quality consistency to the
topology, but this was rejected because (1) some subjects perceived lack of
Scope Consistency to be a separate issue and (2) lack of Scope Consistency is
different in that it can generally be readily discerned by comparing data and
process models, while other consistency problems are apparent only after
significant functional analysis. Lack of Scope Consistency would be expected to
impact negatively on the integrability and maintainability of the specified system
189
9. Table 2.3 Software Defect Matrix: Qualities vs. Class
QUALITY
Scope Consistency
Structuredness
Functionality
Performance
Usability
Security
CLASS
Missing
Incorrect
Table 2.4 shows the matrix resulting from combining the Diagrammatic Model
Type and Software Defect Type typologies.
190
9.1. Table 2.4 Diagrammatic Model Type vs. Software Defect Type
Scope Consistency
QUALITY
Structuredness
Functionality
Performance
Usability
Security
MODEL M M M I M I M I M I
Hierarchical-
W-O D1
Directional-
Data (I)
Hierarchical-
StrC2
Directional-
Hybrid (II)
Hierarchical-
W-O P3
Directional-
Process (III)
Not Hierarchical-
DFD4
Directional-
Hybrid (VIII)
Not Hierarchical-
FlowC5
Directional-
Process (IX)
Not Hierarchical-
ERD6
Not Directional-
Data (X)
Not Hierarchical-
GUI7
Not Directional-
Hybrid (XI)
9.1.1. NOTES
M = missing
I = incorrect
191
2.8 Summary and Conclusions
Prior theory and research that might inform the dissertation are reviewed. A large
body of research exists concerning Formal Technical Review, but review of this
work shows that it is not based on theory and therefore cannot inform this
research effort. The first part of the literature review therefore provides context
rather than explicating applicable theory.
Two other areas are identified as necessary for the development of the research
task and tools: (1) types of diagrammatic models and (2) types of software
defects. The literature is reviewed and new typologies are developed.
192