Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Automatic Identification of Bug-Introducing Changes: Sunghun Kim, Thomas Zimmermann, Kai Pan, E. James Whitehead, JR

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Automatic Identification of Bug-Introducing Changes

Sunghun Kim1, Thomas Zimmermann2, Kai Pan1, E. James Whitehead, Jr.1


1 2
University of California, Saarland University,
Santa Cruz, CA, USA Saarbrücken, Germany
{hunkim, pankai, ejw}@cs.ucsc.edu tz@acm.org

Abstract permanently recording the change. As part of the commit,


Bug-fixes are widely used for predicting bugs or developers commonly (but not always) record in the SCM
finding risky parts of software. However, a bug-fix does system change log the identifier of the bug report that was
not contain information about the change that initially just fixed. We call this modification a bug-fix change.
introduced a bug. Such bug-introducing changes can help Software evolution research leverages the history of
identify important properties of software bugs such as changes and bug reports that accretes over time in SCM
correlated factors or causalities. For example, they reveal systems and bug tracking systems to improve our
which developers or what kinds of source code changes understanding of how a project has grown. It offers the
introduce more bugs. In contrast to bug-fixes that are possibility that by examining the history of changes made
relatively easy to obtain, the extraction of bug- to a software project, we might better understand patterns
introducing changes is challenging. of bug introduction, and raise developer awareness that
In this paper, we present algorithms to automatically they are working on risky—that is, bug-prone—sections
and accurately identify bug-introducing changes. We of a project. For example, if we can find rules that
remove false positives and false negatives by using associate bug-introducing changes with certain source
annotation graphs, by ignoring non-semantic source code code change patterns (such as signature changes that
changes, and outlier fixes. Additionally, we validated that involve parameter addition [11]), it may be possible to
the fixes we used are true fixes by a manual inspection. identify source code change patterns that are bug-prone.
Altogether, our algorithms can remove about 38%~51% Due to the widespread use of bug tracking and SCM
of false positives and 14%~15% of false negatives systems, the most readily available data concerning bugs
compared to the previous algorithm. Finally, we show are the bug-fix changes. It is easy to mine an SCM
applications of bug-introducing changes that demonstrate repository to find those changes that have repaired a bug.
their value for research. To do so, one examines change log messages in two
ways: searching for keywords such as "Fixed" or "Bug"
1. Introduction [12] and searching for references to bug reports like
Today, software bugs remain a constant and costly “#42233” [2, 4, 16]. With bug-fix information,
fixture of industrial and open source software researchers can determine the location of a bug. This
development. To manage the flow of bugs, software permits useful analysis, such as determining per-file bug
projects carefully control their changes using software counts, predicting bugs, finding risky parts of software [7,
configuration management (SCM) systems, capture bug 13, 14], or visually revealing the relationship between
reports using bug tracking software (such as Bugzilla), bugs and software evolution [3].
and then record which change in the SCM system fixes a The major problem with bug-fix data is that it sheds no
specific bug in the change tracking system. light on when a bug was injected into the code and who
The progression of a single bug is as follows. A injected it. The person fixing a bug is often not the person
programmer makes a change to a software system, either who first made the bug, and the bug-fix must, by
to add new functionality, restructure the code, or to repair definition, occur after the bug was first injected. Bug-fix
an existing bug. In the process of making this change, data also provides imprecise data on where a bug
they inadvertently introduce a bug into the software. We occurred. Since functions and methods change their
call this a bug-introducing change, the modification in names over time, the fact that a fix was made to function
which a bug was injected into the software. At some later “foo” does not mean the function still had that name when
time, this bug manifests itself in some undesired external the bug was injected; it could have been named “bar”
behavior, which is recorded in a bug tracking system. then. In order to deeply understand the phenomena
Subsequently, a developer modifies the project’s source surrounding the introduction of bugs into code, such as
code, possibly changing multiple files, and repairs the correlated factors and causalities, we need access to the
bug. They commit this change to the SCM system, actual moment and point the bug was introduced. This is
tricky, and the focus of our paper.
Revision 1 (by kim, bug-introducing) Revision 2 (by ejw) Revision 3 (by kai, bug-fix)
1 kim 1: public void bar() { 2 ejw 1: public void foo() { 2 ejw 1: public void foo() {
1 kim 2: // print report 1 kim 2: // print report 3 kai 2: // print out report
1 kim 3: if (report == null) { 2 ejw 3: if (report == null) 3 kai 3: if (report != null)
1 kim 4: println(report); 2 ejw 4: { 1 kim 4: {
1 kim 5: 1 kim 5: println(report); 1 kim 5: println(report);
1 kim 6: } 1 kim 6: 1 kim 6: }
1 kim 7: }
Figure 1. Example bug-fix and source code changes. A null-value checking bug is injected in revision 1, and fixed in revision 3.

2. Background Let us consider what happens when the SZZ algorithm


Previous work by the second author developed what tries to identify the fix-inducing change associated with
was, prior to the current paper, the only approach for the bug-fix in revision 3. SZZ starts by computing the
identifying bug-introducing changes from bug-fix delta between revisions 3 and 2, yielding the lines 2, 3,
changes [16]. For convenience, we call this previous and 6 (these are highlighted in the figure). SZZ then uses
approach the SZZ algorithm, after the first letters of the SCM annotate data to determine the initial origin of these
authors’ last names. To identify bug-introducing changes, three lines. The first problem we encounter is that SZZ
SZZ first finds bug-fix changes by locating bug identifiers seeks the origin of the comment line (2) and the blank line
or relevant keywords in change log text, or following an (6); clearly neither contains the injected bug, since these
explicitly recorded linkage between a bug tracking system lines are not executable. The next problem comes when
and a specific SCM commit. SZZ then runs a diff tool to SZZ tries to find the origin of line 3. Since revision 2
determine what changed in the bug-fixes. The diff tool modified this line to make a cosmetic change (moving the
returns a list of regions that differ in the two files; each angle bracket), the SCM annotate data indicates that this
region is called a hunk. It observes each hunk in the bug- line was most recently modified at revision 2. SZZ stops
fix and assumes that the deleted or modified source code there, claiming that revision 2 is the bug-introducing
in each hunk is the location of a bug. Finally, SZZ tracks change. This is incorrect, since revision 1 was the point at
down the origins of the deleted or modified source code in which the bug was initially entered into the code. The
the hunks using the built-in annotate feature of SCM cosmetic change threw off the algorithm.
systems. The annotate feature computes, for each line in A final problem is that, using just SCM annotate
the source code, the most recent revision in which the line information, it is impossible to determine that the name of
was changed, and the developer who made the change. the function containing the bug changed its name from
The discovered origins are identified as bug-introducing bar to foo. The annotate information only contains triples
changes. of (current revision line #, most recent modification
Figure 1 shows an example of the history of revision, developer who made modification). There is no
development of a single function over three revisions. information here that states that a given line in one
• Revision 1 shows the initial creation of function bar, revision maps to a specific line in a previous (or
and the injection of a bug into the software, the line ‘if following) revision. It is certainly possible to compute
(report == null) {‘ which should be ‘!=’ instead. The this information—indeed, we do so in the approach we
leftmost column of each revision shows the output of outline in this paper—but to do so requires more
the SCM annotate command, identifying the most recent information than is provided solely by SCM annotate
revision and the developer who made the revision. Since capabilities.
this is the first revision, all lines were first modified at We can now summarize the main two limitations of
revision 1 by the initial developer ‘kim.’ The second the SZZ algorithm:
column of numbers in revision 1 lists line numbers
within that revision. SCM annotation information is insufficient: there is not
• In the second revision, two changes were made. The enough information to identify bug-introducing changes.
function bar was renamed to foo, and a cosmetic change The previous example demonstrates how a simple
was made where the angle bracket at the end of line 3 in formatting change (moving the bracket) modifies SCM
revision 1 was moved down to its own line (4) in annotate data so an incorrect bug-introducing revision is
revision 2. As a result, the annotate output shows lines chosen. It also highlights the need to trace the evolution
1, 3, and 4 as having been most recently modified in of individual lines across revisions, so function/method
revision 2 by ‘ejw.’ containment can be determined.
• Revision 3 shows three changes, a modification to the
comment in line 2, deleting the blank line after the Not all modifications are fixes: Even if a file change is
println, and the actual bug-fix, changing line 3 from defined as a bug-fix by developers, not all hunks in the
‘==’ to ‘!=’. change are bug-fixes. As we saw above, changes to
comments, blank lines, and formatting are not bug-fixes, with some existing applications of bug-introducing
yet are flagged as such. changes (Section 6) and conclusions (Section 7).
These two limitations result in the SZZ algorithm
inaccurately identifying bug-introducing changes. To 3. Experimental Setup
address these issues, in this paper we present an improved In this section, we describe how we extract the change
approach for achieving accurate bug-introducing change history from an SCM system for our two projects of
identification by extending SZZ. In the new approach, we interest. We also explain the accuracy measures we use
employ annotation graphs, which contain information on for assessing the performance of each stage in our
the cross-revision mappings of individual lines. This is an improved algorithm for identifying bug-introducing
improvement over SCM annotate data, and permits a bug changes.
to be associated with its containing function or method.
We additionally remove false bug-fixes caused by 3.1. History Extraction
comments, blank lines, and format changes. Kenyon is a system that extracts source code change
An important aspect of this new approach is that it is histories from SCM systems such as CVS and Subversion
automated. Since revision histories for large projects can [1]. Kenyon automatically checks out the source code for
contain thousands of revisions and thousands of files, each revision and extracts change information such as the
automated approaches are the only ones that scale to this change log, author, change date, source code, change
size. As an automated approach, the bug-introducing delta, and change metadata. We used Kenyon to extract
identification algorithm we describe can be employed in a the histories of two open source projects, as shown in
wide range of software evolution analyses as an initial Table 1.
clean-up step to obtain high quality data sets for further
analysis on the causes and patterns of bug formation. 3.2. Accuracy Measures
To determine the accuracy of the automatic approach, A bug-introducing change set is all of the changes
we use a manual approach as well. Two human judges within a specific range of project revisions that have been
manually verified all hunks in a series of bug-fix changes identified as bug-introducing. Suppose we identify a bug-
to ensure the corresponding hunks are real bug-fixes. introducing change set, P, using a bug-introducing
We applied our automatic and manual approach to identification algorithm such as SZZ [16]. We then apply
identify bug-introducing changes at the method level for the algorithm described in this paper, and derive another
two Java open source projects, Columba and Eclipse bug-introducing change set, R, as shown in Figure 3. The
(jdt.core). We propose the following steps, as shown in common elements of the two sets are P! R.
Figure 2, to remove false positive and false negatives in
identifying bug-introducing changes.

1. Use annotation graphs to provide more detailed


annotation information
2. Ignore comment and blank line changes
3. Ignore format changes
4. Ignore outlier bug-fix revisions in which too many
files were changed
Figure 3. Bug-introducing change sets identified using SZZ
5. Manually verify all hunks in the bug-fix changes (P) and with the new algorithm (R)
Figure 2. Summary of approach Assuming R is the more accurate bug-introducing
In overview, applying this new approach (steps 1-5) change set, we compute false positives and false negatives
removes 38%~51% of false positives and 14%~15% of for the set P as follows:
false negatives as compared to the original SZZ False positive (FP) = | P " R |
algorithm. Using only the automated algorithms (steps 1- |P |
4), we can remove 36~48% false positives and 14% of False negative (FN) = | R ! P |
false negatives. The manual fix verification does not |R|
scale, but highlights the low residual error remaining at !
the end of the automated steps, since it removes only
2~3% of false positives and 1% of false negatives. 4. Algorithms and Experiments
In the remainder of the paper, we begin by describing In this section, we explain our approach in detail and
our experimental setup (Section 3). Following are results present our results from using the improved algorithm to
from our experiments (Section 4), along with discussion identify bug-introducing changes.
of the results (Section 5). Rounding off the paper, we end
Table 1. Analyzed projects. # of revisions indicates the number of revisions we analyzed. # of fix revisions indicates the number of
revisions that were identified as bug-fix revisions. Average LOC indicates the average lines of code of the projects in given periods.
Project Software type Period # of revision # of fix revision % of fix revision Average LOC
Columba Email Client 11/2002 ~ 06/2003 500 143 29% 48,135
Eclipse (jdt.core) IDE 06/2001 ~ 03/2002 1000 158 16% 111,059

possible to identify the function where the bug-


4.1. Using Annotation Graph introducing lines were inserted.
The SZZ algorithm for the identification of bug- We address this problem by using annotation graphs
introducing changes for fine-grained entities such as [18], a representation for origin analysis [6, 10] at the line
functions or methods uses SCM annotation data. In this level, as shown in Figure 5. In an annotation graph, every
section, we show that this information is insufficient, and line of a revision is represented as a node; edges connect
may introduce false positives and negatives. lines (nodes) that evolved from each other: either by
Assume a bug-fix change occurs at revision 20, and modification of the line itself or by moving the line in the
involves the deletion of three lines (see Figure 4). Since file. In Figure 5 two regions were changed between
they were deleted, the three lines are likely to contain a revisions r1 and r2: lines 10 to 12 were inserted and lines
bug. In the SZZ approach, SCM annotate data is used to 19 to 23 were modified. The annotation graph captures
obtain the revisions in which these lines were initially these changes as follows: line 1 in r2 corresponds to line 1
added. The first two lines were added at revision 3, and in r1 and was not changed (the edge is not marked in
the third line was added at revision 9. Thus, we identify bold), the same holds for lines 2 to 9. Lines 10 to 12 were
the changes between revisions 2 and 3 and between inserted in r2, thus they have no origin in r1. Line 13 in r2
revisions 8 and 9 as bug-introducing changes at the file was unchanged but has a different line number (10) in r1,
level. this is indicated by the edge (same for 14 to 18 in r2).
Lines 19 to 23 were modified in r2 and originated from
lines 16 to 20 (edges are marked in bold). Note that we
approximate origin conservatively, i.e., for modifications
we need to connect all lines affected in r1 (lines 16 to 20)
with every line affected in r2 (lines 19 to 23).

Figure 4. Finding bug-Introduction changes in the function


level.
A problem occurs when we try to locate bug-
introducing changes for entities such as functions or
methods. Suppose the deleted source code at revision 20
was part of the 'foo()' function (see Figure 4). Note that
SCM annotation data for CVS or Subversion includes
only revision and author information. This means we only
know that the first two lines in Figure 4 were added at
revision 3 by 'hunkim', but we do not know the actual line
numbers of the deleted code at revision 3. In past
research, it was assumed that the lines at revision 3 are
part of the 'foo()' function, which is marked as a bug-
introducing change, even though there is no guarantee
that the function 'foo()' existed at revision 3. Figure 5. An annotation graph shows line changes of a file
Suppose at revision 3 that 'foo()' does not exist and the for three revisions [18]. A single node represents each line in a
'bar()' function does exist, as shown in Figure 4. One revision; edges between nodes indicate that one line originates
explanation for how this could occur is the ‘bar()’ from another, either by modification or by movement.
function changes its name to ‘foo()’ at some later revision. The annotation graph improves identification of bug-
One consequence is the above assumption is wrong and introducing code by providing for each line in the bug-fix
the 'foo()' function at revision 3 does not contain the bug- change the line number in the bug-introducing revision.
introducing change (false positive). We also miss a real This is computed by performing a backward directed
bug-introducing change, ‘bar()’ at revision 3 (false depth-first search. The resulting line number is then used
negative). Since SCM annotations do not provide the line to identify the correct function name in the bug-fix
numbers for the annotated lines at revision 3, it is not revision. For the above example, the annotation graph
would annotate the deleted lines with the line numbers in
revision 3, which are then used to identify function ‘bar’.
To demonstrate the usefulness of annotation graphs for
locating bug-introducing changes, we identify bug-
introducing changes at the method level for our two
projects with and without the use of annotation graphs.
The left circle in Figure 6 (a) shows the count of bug-
introducing changes at method level identified without Figure 8. Identified bug-introducing change sets by ignoring
using the annotation graph; the right circle shows the comment and blank line changes.
count when using the annotation graphs. Without the
annotation graph we have about 2% false positives and 4.3. Format Changes
1~4% false negatives (total 3~6% errors) in identifying Similar to the comment and blank line changes, source
bug-introducing changes. Thus, annotation graphs provide code format changes do not affect software behavior. So
information for more accurate bug-introducing change if the source code’s format was changed during a bug-fix,
identification at the method level. as is shown in Figure 9, the source code format change
should be ignored when we identify bug-introducing
changes.

- if ( folder == null ) return;


+ if (folder == null)
+ return;
Figure 9. Format change example in Columba
(mail/core/org/columba/mail/gui/table/FilterToolbar.java)
Figure 6. Bug-introducing change sets with and without Unlike the comment and blank line changes, format
annotation graph. changes affect the SCM annotation information. For
4.2. Non Behavior Changes example, consider the ‘foo’ function changes shown in
Software bugs involve incorrect behavior of the Figure 10. Revision 10 is a bug-fix change, involving
software [8], and hence are not located in the formatting repair to a faulty ‘if’. To identify the corresponding bug-
of the source code, or in comments. Changes to source introducing changes, we need to find the origin of the ‘if’
code format or comments, or the addition/removal of at revision 10. Revision 5 only involves a formatting
blank lines, do not affect software’s behavior. For change to the code. If we do not ignore source code
example, Figure 7 shows a change in which one blank format changes, when we examine the SCM annotation
line was deleted and an ‘if condition’ was added to fix a information, we identify that ‘foo’ at revision 5 is a bug-
bug. If we just apply SZZ, we identify the blank line as a introducing change (a false positive). In fact, the
problematic line and search for the origin of the blank problematic line was originally created at revision 3 (this
line. We identify the revision and corresponding method was missed, hence a false negative). Due to inaccurate
of the blank line as a bug-introducing change, which is a annotation information, source code format changes lead
false positive. to significant amounts of false positives and false
To remove such false positives, we ignore blank lines negatives. Ignoring software format changes is an
and comment changes in the bug-fix hunks. important process in the accurate identification of bug-
introducing changes.
public void notifySourceElementRequestor()
{ Revision 3
- if ( a == true ) return;
+ if (reportReferenceInfo) { Revision 5
+ notifyAllUnknownReferences(); if (a == true)
+ } return;
// collect the top level ast nodes Revision 10 (bug-fix)
int length = 0; if (a == false)
Figure 7. Blank line deletion example in Eclipse return;
(compiler/org/eclipse/jdt/internal/compiler/SourceElementP Figure 10. False positive and false negative example caused
arser.java) by format changes.

Figure 8 shows the difference in identified bug- Figure 11 compares the results of the SZZ approach
introducing change sets by ignoring comment and blank with the improved approach that identifies bug-
line changes. This approach removes 14%~20% of false introducing changes by ignoring format changes in bug-
positives. fix hunks. Overall, ignoring source code format changes
removes 18%~25% of false positives and 13%~14% of the changes are method name and parameter name
false negatives. changes. For example, one parameter type changed from
‘TypeDeclaration’ to ‘LocalTypeDeclaration’, and hence
the revision contains 7 file changes related to this change,
as shown Figure 13.
- public boolean visit(TypeDeclaration
- typeDeclaration, BlockScope scope){
+ public boolean visit(LocalTypeDeclaration
+ typeDeclaration, BlockScope scope){

Figure 13. Object type change example in Eclipse


(search/org/eclipse/jdt/internal/core/search/matching/Match
Figure 11. Bug-introducing change sets identified by Set.java)
ignoring source code format changes. As shown in Figure 14, ignoring outlier revisions
removes 7%~16% of false positives. Even though most
4.4. Remove Fix Revision Outliers changes in the outlier revisions contain method name
It is questionable if all the file changes in a bug-fix changes or parameter changes, it is possible that these
revision are bug-fixes, especially if a bug-fix revision changes are real bug-fixes. A determination of whether
contains large numbers of file changes. It seems very they are truly ignorable outliers will depend on the
improbable that in a bug-fix change containing hundreds individual project. As a result, ignoring outlier revisions is
of file changes every one would have some bearing on the an optional aspect of our approach for identifying bug-
fixed bug. We observed the number of files changed in introducing changes.
each bug-fix revision for our two projects, as shown in
Figure 12. Most bug-fix revisions contain changes to just
one or two files. All 50% of file change numbers per
revision (between 25% and 75% quartiles) are about 1-3.
A typical approach for removing outliers from data is if a
data item is 1.5 times greater than the 50% quartile, it is
assumed to be an outlier. In our experiment, we adopt a
very conservative approach, and use as our definition of
outlier file change counts that are greater than 5 times the Figure 14. Bug-introducing change sets identified by
50% quartile. This ensures that any changes we note as ignoring outlier revisions.
outliers truly have a large number of file changes.
Changes identified as outliers for our two projects are 4.5. Manual Fix Hunk Verification
shown as ‘+’ in Figure 12. We identify bug-fix revisions by mining change logs,
and bug-fix revision data is used to identify bug-
introducing changes. If a change log indicates the revision
is a bug-fix, we assume the revision is a bug-fix and all
hunks in the revision are bug-fixes. Then how many of
them are true bug-fixes? It depends on the quality of the
change log and understanding the degree of the bug-fixes.
One developer may think a change is a bug-fix, while
others think it is only a source code cleanup or a new
feature addition. To check how many bug-fix hunks are
true bug-fixes, we manually verified all bug-fix hunks and
marked them as bug-fix or non-bug-fix. Two human
judges, graduate students who have multiple years of Java
development experience, performed the manual
verification. A judge marks each bug-fix hunk of two
projects (see Table 1) and another judge reviews the
marks. Judges use a GUI-based bug-fix hunk verification
tool. The tool shows individual hunks in the bug-fix
Figure 12. Box plots for the number of file changes per revision. Judges read the change logs and source code
revision. carefully and decide if the hunk is a bug-fix. The total
time spent is shown in Table 2.
To ensure we were not incorrectly labeling these
changes as outliers, we manually inspected each file
change in the outlier revisions. We observed that most of
Table 2. Manual fix hunk validation time of two human
judges.
Judges Columba Eclipse
Judge 1 3.5 hours 4 hours
Judge 2 4.5 hours 5 hours
The most common kind of non-bug-fix hunks in the
bug-fix revision involves variable renaming, as shown in
Figure 15. This kind of variable renaming does not affect Figure 17. Bug-introducing changes identified by the
software behavior, but it is not easy to automatically original SZZ algorithm [16] (P) and by the approach (steps
detect this kind of change without performing deep static 1-5) proposed in this paper (R).
or dynamic analysis. The manual bug-fix hunk verification gives us a good
deleteResources(actualNonJavaResources,fForce); sense of how many hunks in bug-fix revisions are true
- IResource[] remaingFiles;
+ IResource[] remainingFiles; bug-fixes. There is no doubt that manual bug-fix hunk
try { verification leads to more accurate bug-introducing
- remaingFiles=((IFolder)res).members();
+ remainingFiles=((IFolder)res).members();
changes. Unfortunately, manual fix hunk verification does
} not scale. The reason that we examined only the first
Figure 15. Variable Renaming example in Eclipse 500~1000 revisions (Table 1) is the high cost of the
(model/org/eclipse/jdt/internal/core/DeleteResourceElements manual verification. Figure 18 shows the false positives
Operation) and false negatives removed by applying only automatic
We identify bug-introducing changes after the manual algorithms (steps 1-4 in Figure 2). Automatic algorithms
fix hunk validation, as shown in Figure 16. Manual remove about 36~48% of false positives and 14% of false
verification removes 4~5% false positives. Unfortunately, negatives, yielding only 1~3% difference as compared to
the manual validation requires domain knowledge and applying all algorithms (steps 1-5 in Figure 2). Since the
does not scale. However, the amount of false positives errors removed by manual verification are not significant,
removed by manual verification was not substantial. We manual fix hunk verification can be skipped when
believe it is possible to skip the manual validation for identifying bug-introducing changes.
bug-introducing change identification. We compare the
overall false positives and false negatives using the
automatic algorithms with manual validation in next
section.

Figure 18. Bug-introducing changes identified by the


original SZZ algorithm [16] (P) and by the automatable
steps (1-4) described in this paper (R).

5. Discussion
In this section, we discuss the relationship between
Figure 16. Bug-introducing change sets after manual fix identified bug-fixes and true bug-fixes. We also discuss
hunk validation. the relationship between identified bug-introducing
changes and true bugs.
4.6. Summary 5.1. Are All Identified Fixes True Fixes?
We applied the steps described in Figure 2 to remove We used two approaches to identify bug-fixes:
false positive and false negative bug-introducing changes. searching for keywords such as "Fixed" or "Bug" [12] and
In this section we compare the identified bug-introducing searching for references to bug reports like “#42233” [2,
change sets gathered using the original SZZ algorithm 4, 16]. The accuracy of bug-fix identification depends on
[16] and those from our new algorithm (steps 1-5 in the quality of change logs and linkages between SCM and
Figure 2). Overall, Figure 17 shows that applying our bug tracking systems. The two open source projects we
algorithms removes about 38%~51% of false positives examined have, to the best of our knowledge, the highest
and 14~15% of false negatives—a substantial error quality change log and linkage information of any open
reduction. source project. In addition, two human judges manually
validated all bug-fix hunks. We believe the identified
bug-fix hunks are, in almost all cases, real fixes. Still
there might be false negatives. For example, even though
a change log does not indicate a given change is a fix, it is Manual fix hunk verification may include errors. Even
possible that the change includes a fix. To measure false though we selected two human judges who have multiple
negative fix changes, we need to manually inspect all years of Java programming experience, their manual fix
hunks in all revisions, a daunting task. This remains hunk validation may contain errors.
future work.
6. Applications
5.2. Are Bug-Introducing Changes True Bugs? In the first part of this paper, we presented an approach
Are all identified bug-introducing changes real bugs?
for identifying bug-introducing changes more accurately
It may depend on the definition of ‘bug’. IEEE defines
than SZZ. In this section, we discuss possible applications
anomaly, which is a synonym of fault, bug, or error, as:
for these bug-introducing changes.
“any condition that departs from the expected [8].”
Verifying whether all identified bug-introducing changes 6.1. Bug-Introduction Statistics
meet a given definition of bug remains future work. Information about bug-introducing changes can be
More importantly, we propose algorithms to remove used to help understand software bugs. Unlike bug-fix
false positives and false negatives in the identified bugs. information, bug-introducing changes provide the exact
As shown in Figure 19, even though we do not know the time a bug occurs. For example, it is possible to determine
exact set of real bugs, our algorithms can identify a set the day in which bugs are most introduced. We can also
that is closer to the real bug set than the set identified by now determine the most bug-prone authors. When
the original SZZ algorithm [16]. Even if not perfect, our combined with bug-fix information, we can determine
approach is better than the current state of the art. how long it took to fix a bug after it was introduced.
Sliwerski et al. performed an experiment to find out
the most bug-prone day by computing bug-introducing
change rates over all changes [16]. They found that Friday
is the most bug-prone day in the projects examined.

Figure 19. False positives and false negatives of each bug-


introducing identification process.

5.3. Threat to Validity


There are four major threats to the validity of this work.
Systems examined might not be representative. We Figure 20. Eclipse author bug-fix and bug-introducing
examined 2 systems, so it is possible that we accidentally change contributions.
chose systems that have better (or worse) than average
false positive and negative bug-introducing changes.
Since we intentionally only chose systems that had some
degree of linkage between bug tracking systems and the
change log (so we could determine bug-fixes), we have a
project selection bias. It certainly would be nice to have a
larger dataset.
Systems are all open source. The systems examined in
this paper all use an open source development
methodology, and hence might not be representative of all
development contexts. It is possible that the stronger
Figure 21. Columba author bug-fix and bug-introducing
deadline pressure of commercial development could lead
change contributions.
to different results.
Bug-fix data is incomplete. Even though we selected
projects that have change logs with good quality, we still In the two projects we examined, we computed the bug-
are only able to extract a subset of the total number of introducing change rates and bug-fix change rates per
bug-fixes. For projects with a poor change log quality, the author, shown in Figure 20 and Figure 21. The figures
false negatives of bug-introducing change identification show that rates of bug-introduction and bug-fixing are
will be higher. different. For example, in Eclipse, author a1 makes about
40% of all fixes, but introduces about 75% of all bugs. In
contrast, author a2 fixes far more bugs than they The bug prone change pattern analysis depends on
introduce. These numbers do not allow conclusions on the having access to bug-introducing changes, since
performance of individual developers: in many projects otherwise we do not know when a bug was introduced.
the most skillful developers are assigned to the most
difficult parts; thus they are likely to introduce more bugs.
Using the bug-introducing change information, we can
determine the exact bug residency time, the elapsed time
between initial injection of a bug and its eventual fix. The
bug residency time provides a good understanding of the Figure 23. Bug-introducing changes and signature changes.
entire life cycle of a bug, starting with the injection of the
bug in a bug-introducing change, appearance of the bug in 6.3. Change Classification
a bug report, and the end of the bug in a bug-fix change. In the previous section, we provided one example of
Previous research tries to measure the time it takes to fix a finding bug-prone source code change patterns. If a
bug after a bug report has been entered, but without the source code change pattern is consistent with bug-
bug-introducing changes, it is not possible to determine introducing changes, then we can use such factors to
the entire life cycle of a bug. Figure 22 shows the average predict unknown changes as buggy or clean. Suppose we
bug residency time using box-plots for Columba and observe various change factors between 1 to n revisions
Eclipse. For example, the box-plot for Columba shows as shown in Figure 24. We know which changes are bug-
that the average bug residency time is around 40 days, the introducing changes and which changes are not. This
25% quartile is around 10 days and 75% quartile is permits us to train a model using labeled change factors,
around 100 days. where the changes are labeled as being bug-introducing or
clean. Using the trained model, we can predict whether
future unknown changes are buggy or clean.

Figure 24. Predicting future changes using identified bug-


introducing changes.
There are many machine learning algorithms [17] that
take pre-labeled instances, train a model, and predict
unknown instances using the model. Finding consistent
bug-prone factors might be challenging, but it is possible
to label changes and make a training data set using bug-
introducing changes. Such change classification is not
Figure 22. Average bug residency time of Columba and
possible without the bug-introducing change data. Hence,
Eclipse.
one key benefit of ready access to bug-introducing
6.2. Bug Prone Change Patterns changes is the ability to apply machine learning
techniques to bug prediction.
Since we can determine bug-introducing changes, it is
possible to analyze the source code for any patterns that 6.4. Awareness Tool: HATARI
might exist in bug prone code. Signature changes [11] and Every programmer knows that there are locations in
micro pattern changes [5] are examples of source code the code where it is difficult to get things right. The
change patterns. Suppose we identify bug-introducing HATARI tool [15] identifies the individual risk for all
changes and function signature changes as shown in code locations by examining, for each location, whether
Figure 23. We can then try to find correlations between earlier changes caused problems. To identify such
signature and bug-introducing changes [11]. changes HATARI mines bug-introducing changes
We analyzed micro pattern changes in Java source automatically from version archives and bug databases.
code using bug-introducing changes to determine what The risk of a location L is then estimated as the
kinds of micro pattern changes introduce more/less bugs percentage of “bad” changes at that location:
[9]. Micro patterns capture non-trivial idioms of Java
programming languages [5]. This work did identify some number of bug introducing changes at L
bug prone micro patterns such as Box, CompoundBox, risk( L) =
number of changes at L
Sampler, Pool, Outline, and CommonState [9].
Relationships," Proc. of 10th European Conference on
Software Maintenance and Reengineering (CSMR 2006),
Bari, Italy, pp. 227-236, 2006.
[4] M. Fischer, M. Pinzger, and H. Gall, "Populating a Release
History Database from Version Control and Bug Tracking
Systems," Proc. of 19th International Conference on
Software Maintenance (ICSM 2003), pp. 23-32, 2003.
[5] J. Y. Gil and I. Maman, "Micro Patterns in Java Code,"
Proc. of the 20th Object Oriented Programming Systems
Languages and Applications (OOPSLA '05), San Diego,
CA, USA, pp. 97 - 116, 2005.
[6] M. W. Godfrey and L. Zou, "Using Origin Analysis to
Detect Merging and Splitting of Source Code Entities,"
Figure 25. Source code highlights of HATARI. IEEE Trans. on Software Engineering, vol. 31, pp. 166-
181, 2005.
Risky locations are important for maintenance, such as [7] A. E. Hassan and R. C. Holt, "The Top Ten List: Dynamic
adding extra documentation or restructuring, and for Fault Prediction," Proc. of 21st International Conference on
quality assurance, because changes that occur at risky Software Maintenance (ICSM 2005), Budapest, Hungary,
locations should get more attention. In order to support pp. 263-272, 2005.
developers during these tasks, HATARI highlights such [8] IEEE, "IEEE Standard Classification for Software
locations (see Figure 25) and provides views to browse Anomalies," IEEE Std 1044-1993 Dec 1993.
the most risky locations and to analyze the risk history of [9] S. Kim, K. Pan, and E. J. Whitehead, Jr., "Micro Pattern
Evolution," Proc. of Int'l Workshop on Mining Software
particular locations. HATARI depends strongly on the
Repositories (MSR 2006), Shanghai, China, pp. 40 - 46,
quality of bug-introducing changes. By reducing false 2006.
positives and negatives, its annotations will be improved. [10] S. Kim, K. Pan, and E. J. Whitehead, Jr., "When Functions
Change Their Names: Automatic Detection of Origin
7. Conclusions Relationships," Proc. of 12th Working Conference on
Bug-introducing changes are important information for Reverse Engineering (WCRE 2005), Pennsylvania, USA,
understanding properties of bugs, mining bug prone pp. 143-152, 2005.
change patterns, and predicting future bugs. In this paper [11] S. Kim, E. J. Whitehead, Jr., and J. Bevan, "Properties of
we describe a new approach for more accurately Signature Change Patterns," Proc. of 22nd International
identifying bug-introducing changes from bug-fix data. Conference on Software Maintenance (ICSM 2006),
The approach in this paper removes many false positives Philadelphia, Pennsylvania, 2006.
and false negatives as compared to the prior SZZ [12] A. Mockus and L. G. Votta, "Identifying Reasons for
Software Changes Using Historic Databases," Proc. of 16th
algorithm. Our experiments show that our approach,
International Conference on Software Maintenance (ICSM
including manual validation, can remove 38~51% of false 2000), San Jose, California, USA, pp. 120-130, 2000.
positives and 14% of false negatives as compared to SZZ. [13] N. Nagappan and T. Ball, "Use of Relative Code Churn
Omitting the manual validation and using only Measures to Predict System Defect Density," Proc. of 27th
automatable processes, we can still remove 36%~48% of International Conference on Software Engineering (ICSE
false positives and 14% of false negatives. Using our 2005), Saint Louis, Missouri, USA, pp. 284-292, 2005.
approach, we can identify bug-introducing changes more [14] T. J. Ostrand, E. J. Weyuker, and R. M. Bell, "Where the
accurately than the prior SZZ algorithm, which is the Bugs Are," Proc. of 2004 ACM SIGSOFT International
current state of the art. We also showed various Symposium on Software Testing and Analysis, Boston,
Massachusetts, USA, pp. 86 - 96, 2004.
applications of the bug-introducing changes. We believe
[15] J. !liwerski, T. Zimmermann, and A. Zeller, "HATARI:
that software bug related research should use bug- Raising Risk Awareness. Research Demonstration," Proc.
introducing change information. of the 2005 European Software Engineering Conference
and 2005 Foundations of Software Engineering
8. References (ESEC/FSE 2005), Lisbon, Portugal, pp. 107-110, 2005.
[1] J. Bevan, E. J. Whitehead, Jr., S. Kim, and M. Godfrey, [16] J. !liwerski, T. Zimmermann, and A. Zeller, "When Do
"Facilitating Software Evolution with Kenyon," Proc. of Changes Induce Fixes?" Proc. of Int'l Workshop on Mining
the 2005 European Software Engineering Conference and Software Repositories (MSR 2005), Saint Louis, Missouri,
2005 Foundations of Software Engineering (ESEC/FSE USA, pp. 24-28, 2005.
2005), Lisbon, Portugal, pp. 177-186, 2005. [17] I. H. Witten and E. Frank, Data Mining: Practical Machine
[2] D. Cubranic and G. C. Murphy, "Hipikat: Recommending Learning Tools and Techniques (Second Edition): Morgan
pertinent software development artifacts," Proc. of 25th Kaufmann, 2005.
International Conference on Software Engineering (ICSE [18] T. Zimmermann, S. Kim, A. Zeller, and E. J. Whitehead,
2003), Portland, Oregon, pp. 408-418, 2003. Jr., "Mining Version Archives for Co-changed Lines,"
[3] M. D'Ambros and M. Lanza, "Software Bugs and Proc. of Int'l Workshop on Mining Software Repositories
Evolution: A Visual Approach to Uncover Their (MSR 2006), Shanghai, China, pp. 72 - 75, 2006.

You might also like