Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques

Published: 01 June 2005 Publication History
  • Get Citation Alerts
  • Abstract

    We describe a method to use the source code change history of a software project to drive and help to refine the search for bugs. Based on the data retrieved from the source code repository, we implement a static source code checker that searches for a commonly fixed bug and uses information automatically mined from the source code repository to refine its results. By applying our tool, we have identified a total of 178 warnings that are likely bugs in the Apache Web server source code and a total of 546 warnings that are likely bugs in Wine, an open-source implementation of the Windows API. We show that our technique is more effective than the same static analysis that does not use historical data from the source code repository.

    References

    [1]
    Apache Web Server, httpd, available online at http://httpd. apache.org, 2004.
    [2]
    K. Ashcraft and D. Engler, “Using Programmer-Written Compiler Extensions to Catch Security Holes,” Proc. IEEE Symp. Security and Privacy, May 2002.
    [3]
    T. Ball and S.K. Rajamani, “The SLAM Project: Debugging System Software via Static Analysis,” Proc. 29th Symp. Principles of Programming Languages (POPL '02), pp. 1-3, Jan. 2002.
    [4]
    J. Bevan and E.J. Whitehead, “Identification of Software Instabilities,” Proc. 10th Working Conf. Reverse Eng. (WCRE '03), pp. 134-143, Nov. 2003.
    [5]
    A. Chen E. Chou J. Wong A.Y. Yao Q. Zhang S. Zhang and A. Michal, “CVSSearch: Searching through Source Code using CVS Comments,” Proc. IEEE Int'l Conf. Software Maintenance (ICSM '01), pp. 364-373, Nov. 2001.
    [6]
    D. Cubranic, “Project History as a Group Memory: Learning from the Past,” PhD thesis, Univ. of British Columbia, 2004.
    [7]
    CVS—Concurrent Versions System, available online at http://www.cvshome.org, 2004.
    [8]
    A. Descartes and T. Bunce, Programming the Perl DBI. O'Reilly, 2000.
    [9]
    D. Engler B. Chelf A. Chou and S. Hallem, “Checking System Rules Using System Specific, Programmer-Written Compiler Extensions,” Proc. Fourth Symp. Operating Systems Design and Implementation, Oct. 2000.
    [10]
    R. Ferenc I. Siket and T. Gyimothy, “Extracting Facts from Open Source Software,” Proc. 20th Int'l Conf. Software Maintenance (ICSM '04), pp. 60-69, Sept. 2004.
    [11]
    M. Fischer and H. Gall, “Visualizing Feature Evolution of Large-Scale Software based on Problem and Modification Report Data,” J. Software Maintenance and Evolution: Research and Practice, vol. 16, pp. 385-403, Nov./Dec. 2004.
    [12]
    M. Fischer M. Pinzger and H. Gall, “Analyzing and Relating Bug Report Data for Feature Tracking,” Proc. 10th Working Conf. Reverse Eng. (WCRE '03), pp. 90-99, Nov. 2003.
    [13]
    D.M. German, “An Empirical Study of Fine-Grained Software Modifications,” Proc. 20th Int'l Conf. Software Maintenance (ICSM '04), pp. 316-325, Sept. 2004.
    [14]
    T.L. Graves A.F. Karr J.S. Marron and H. Siy, “Predicting Fault Incidence Using Software Change History,” IEEE Trans. Software Eng., vol. 26, no. 7, pp. 653-661, July 2000.
    [15]
    A.E. Hassan and R.C. Holt, “Predicting Change Propagation in Software Systems,” Proc. 20th Int'l Conf. Software Maintenance (ICSM '04), pp. 284-293, Sept. 2004.
    [16]
    D.L. Heine and M.S. Lam, “A Practical Flow-Sensitive and Context-Sensitive C and C++ Memory Leak Detector,” Proc. Conf. Programming Language Design and Implementation (PLDI '03), June 2003.
    [17]
    D. Hovemeyer and W. Pugh, “Finding Bugs Is Easy,” Companion of the 19th Ann. ACM SIGPLAN Conf. Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA '04), Oct. 2004.
    [18]
    S. Johnson, Unix Time Sharing System Programmer's Manual, seventh ed. vol. 2A, AT&T Bell Laboratories 1979.
    [19]
    T. Kremeneck and D. Engler, “Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations,” Proc. 10th Ann. Int'l Static Analysis Symp. (SAS '03), pp. 295-315, June 2003.
    [20]
    T. Matsumura A. Monden and K. Matsumoto, “The Detection of Faulty Code Violating Implicit Coding Rules,” Proc. Int'l Workshop Principles of Software Evolution (IWPSE '02), pp. 15-21, May 2002.
    [21]
    T. Menzies J.S. DiStefano C. Cunanan and R. Chapman, “Mining Repositories to Assist in Project Planning and Resource Allocation,” Proc. Int'l Workshop Mining Software Repositories (MSR '04), May 2004.
    [22]
    T.J. Ostrand E.J. Weyuker and R.M. Bell, “Where the Bugs Are,” Proc. 2004 ACM SIGSOFT Int'l Symp. Software Testing and Analysis (ISSTA '04), July 2004.
    [23]
    R. Purushothaman and D.E. Perry, “Towards Understanding the Rhetoric of Small Changes,” Proc. Int'l Workshop Mining Software Repositories (MSR '04), May 2004.
    [24]
    D. Quinlan, “ROSE: A Preprocessor Generation Tool for Leveraging the Semantics of Parallel Object-Oriented Frameworks to Drive Optimizations via Source Code Transformations,” Proc. Eighth Int'l Workshop Compilers for Parallel Computers (CPC '00), Jan. 2000.
    [25]
    RCS, available online at http://www.cs.purdue.edu/homes/trinkle/RCS/index.html, 2004.
    [26]
    F. Rysselberghe and S. Demeyer, “Mining Version Control Systems for FACs (Frequently Applied Changes),” Proc. Int'l Workshop Mining Software Repositories (MSR '04), May 2004.
    [27]
    R.M. Stallman, Using the GNU Compiler Collection. GNU Press, 2004.
    [28]
    M. Widenius and D. Axmark, MySQL Reference Manual Documentation from the Source. O'Reilly, 2002.
    [29]
    C.C. Williams and J.K. Hollingsworth, “Bug Driven Bug Finders,” Proc. Int'l Workshop Mining Software Repositories (MSR '04), May 2004.
    [30]
    Wine, available online at http://www.winehq.org, 2004.
    [31]
    T. Zimmermann and P. Weissgerber, “Preprocessing CVS Data for Fine-Grained Analysis,” Proc. Int'l Workshop Mining Software Repositories (MSR '04), May 2004.

    Cited By

    View all
    • (2024)VALIDATEInformation and Software Technology10.1016/j.infsof.2024.107448170:COnline publication date: 1-Jun-2024
    • (2023)ViolationTracker: Building Precise Histories for Static Analysis ViolationsProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00171(2022-2034)Online publication date: 14-May-2023
    • (2023)Understanding Why and Predicting When Developers Adhere to Code-Quality StandardsProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00045(432-444)Online publication date: 17-May-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Software Engineering
    IEEE Transactions on Software Engineering  Volume 31, Issue 6
    June 2005
    102 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 June 2005

    Author Tags

    1. Index Terms- Testing tools
    2. configuration control
    3. debugging aids.
    4. version control

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)VALIDATEInformation and Software Technology10.1016/j.infsof.2024.107448170:COnline publication date: 1-Jun-2024
    • (2023)ViolationTracker: Building Precise Histories for Static Analysis ViolationsProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00171(2022-2034)Online publication date: 14-May-2023
    • (2023)Understanding Why and Predicting When Developers Adhere to Code-Quality StandardsProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00045(432-444)Online publication date: 17-May-2023
    • (2022)A Hybrid Approach for Inference between Behavioral Exception API Documentation and Implementations, and Its ApplicationsProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3560434(1-13)Online publication date: 10-Oct-2022
    • (2022)Detecting false alarms from automatic static analysis toolsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510214(698-709)Online publication date: 21-May-2022
    • (2022)Survey of Approaches for Postprocessing of Static Analysis AlarmsACM Computing Surveys10.1145/349452155:3(1-39)Online publication date: 3-Feb-2022
    • (2021)A suite of Process Metrics to Capture the Effort of DevelopersProceedings of the 2021 10th International Conference on Software and Computer Applications10.1145/3457784.3457805(131-136)Online publication date: 23-Feb-2021
    • (2021)Mining Fix Patterns for FindBugs ViolationsIEEE Transactions on Software Engineering10.1109/TSE.2018.288495547:1(165-188)Online publication date: 8-Jan-2021
    • (2020)WarningsFIX: a Recommendation System for Prioritizing Warnings Generated by Automated Static AnalyzersProceedings of the XIX Brazilian Symposium on Software Quality10.1145/3439961.3439987(1-10)Online publication date: 1-Dec-2020
    • (2020)Mining for Process ImprovementsProceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops10.1145/3387940.3392168(189-190)Online publication date: 27-Jun-2020
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media