Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2970276.2970356acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article
Public Access

How good are the specs? a study of the bug-finding effectiveness of existing Java API specifications

Published: 25 August 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Runtime verification can be used to find bugs early, during software development, by monitoring test executions against formal specifications (specs). The quality of runtime verification depends on the quality of the specs. While previous research has produced many specs for the Java API, manually or through automatic mining, there has been no large-scale study of their bug-finding effectiveness.
    We present the first in-depth study of the bug-finding effectiveness of previously proposed specs. We used JavaMOP to monitor 182 manually written and 17 automatically mined specs against more than 18K manually written and 2.1M automatically generated tests in 200 open-source projects. The average runtime overhead was under 4.3x. We inspected 652 violations of manually written specs and (randomly sampled) 200 violations of automatically mined specs. We reported 95 bugs, out of which developers already fixed 74. However, most violations, 82.81% of 652 and 97.89% of 200, were false alarms.
    Our empirical results show that (1) runtime verification technology has matured enough to incur tolerable runtime overhead during testing, and (2) the existing API specifications can find many bugs that developers are willing to fix; however, (3) the false alarm rates are worrisome and suggest that substantial effort needs to be spent on engineering better specs and properly evaluating their effectiveness

    References

    [1]
    C. Allan, P. Avgustinov, A. S. Christensen, L. Hendren, S. Kuzins, O. Lhoták, O. de Moor, D. Sereni, G. Sittampalam, and J. Tibble. Adding trace matching with free variables to AspectJ. In OOPSLA, pages 345–364, 2005.
    [2]
    M. Arnold, M. Vechev, and E. Yahav. QVM: An efficient runtime for detecting defects in deployed systems. In OOPSLA, pages 143–162, 2008.
    [3]
    N. E. Beckman and A. V. Nori. Probabilistic, modular and scalable inference of typestate specifications. In PLDI, pages 211–221, 2011.
    [4]
    S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi´c, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA, pages 169–190, 2006.
    [5]
    E. Bodden. MOPBox: A library approach to runtime verification. In RV Tool Demo, pages 365–369, 2011.
    [6]
    E. Bodden, L. Hendren, P. Lam, O. Lhoták, and N. A. Naeem. Collaborative runtime verification with tracematches. In RV, pages 22–37, 2007.
    [7]
    E. Bodden, L. J. Hendren, and O. Lhoták. A staged static program analysis to improve the performance of runtime monitoring. In ECOOP, pages 525–549, 2007.
    [8]
    E. Bodden, P. Lam, and L. Hendren. Finding programming errors earlier by evaluating runtime monitors ahead-of-time. In FSE, pages 36–47, 2008.
    [9]
    D. Chen, Y. Zhang, R. Wang, X. Li, L. Peng, and W. Wei. Mining universal specification based on probabilistic model. In SEKE, pages 471–476, 2015.
    [10]
    F. Chen and G. Ro¸su. Towards monitoring-oriented programming: A paradigm combining specification and implementation. In RV, pages 108–127, 2003.
    [11]
    W. G. Cochran. Sampling techniques. John Wiley & Sons, 1977.
    [12]
    Collections SynchronizedCollection. http://fsl.cs. illinois.edu/annotated-java/ properties/html/java/ util/Collections SynchronizedCollection.html.
    [13]
    V. Dallmeier, N. Knopp, C. Mallon, S. Hack, and A. Zeller. Generating test cases for specification mining. In ISSTA, pages 85–96, 2010.
    [14]
    CompleteSearch DBLP. http://www.dblp.org/search/ index.php.
    [15]
    M. B. Dwyer, R. Purandare, and S. Person. Runtime verification in context: Can optimizing error detection improve fault diagnosis? In RV, pages 36–50, 2010.
    [16]
    V. Forejt, M. Kwiatkowska, D. Parker, H. Qu, and M. Ujma. Incremental runtime verification of probabilistic systems. In RV, pages 314–319, 2012.
    [17]
    M. Gabel and Z. Su. Online inference and enforcement of temporal properties. In ICSE, pages 15–24, 2010.
    [18]
    M. Gabel and Z. Su. Testing mined specifications. In FSE, pages 1–11, 2012.
    [19]
    S. Hussein, P. Meredith, and G. Ro¸su. Security-policy monitoring and enforcement with JavaMOP. In PLAS, pages 1–11, 2012.
    [20]
    JavaMOPAgent Documentation. https://github.com/ runtimeverification/javamop/blob/master/docs/ JavaMOPAgentUsage.md.
    [21]
    JavaMOP. http://fsl.cs.illinois.edu/index.php/ JavaMOP.
    [22]
    java.util.collections. https://docs.oracle.com/javase/7/ docs/api/java/util/Collections.html.
    [23]
    D. Jin, P. O. Meredith, D. Griffith, and G. Ro¸su. Garbage collection for monitoring parametric properties. In PLDI, pages 415–424, 2011.
    [24]
    D. Jin, P. O. Meredith, C. Lee, and G. Ro¸su. JavaMOP: Efficient parametric runtime monitoring framework. In ICSE Demo, pages 1427–1430, 2012.
    [25]
    D. Jin, P. O. Meredith, and G. Ro¸su. Scalable parametric runtime monitoring. Technical report, Computer Science Dept., UIUC, 2012.
    [26]
    Joda-Time. http://www.joda.org/joda-time/.
    [27]
    M. Karaorman and J. Freeman. jMonitor: Java runtime event specification and monitoring library. In RV, pages 181 – 200, 2004.
    [28]
    I. Krka, Y. Brun, and N. Medvidovic. Automatic mining of specifications from invocation traces and method invariants. In FSE, pages 178–189, 2014.
    [29]
    C. Le Goues and W. Weimer. Specification mining with few false positives. In TACAS, pages 292–306, 2009.
    [30]
    C. Lee, F. Chen, and G. Ro¸su. Mining parametric specifications. In ICSE, pages 591–600, 2011.
    [31]
    C. Lee, D. Jin, P. O. Meredith, and G. Ro¸su. Towards categorizing and formalizing the JDK API. Technical report, Computer Science Dept., UIUC, 2012.
    [32]
    O. Legunsen, D. Marinov, and G. Ro¸su. Evolution-aware monitoring-oriented programming. In ICSE NIER, pages 615–618, 2015.
    [33]
    C. Lemieux. Mining temporal properties of data invariants. In ICSE SRC, pages 751–753, 2015.
    [34]
    C. Lemieux, D. Park, and I. Beschastnikh. General LTL specification mining. In ASE, pages 81–92, 2015.
    [35]
    Q. Luo, Y. Zhang, C. Lee, D. Jin, P. O. Meredith, T. F. ¸ Serbănu¸tă, and G. Ro¸su. RV-Monitor: Efficient parametric runtime verification with simultaneous properties. In RV, pages 285–300, 2014.
    [36]
    P. Meredith, D. Jin, F. Chen, and G. Ro¸su. Efficient monitoring of parametric context-free patterns. In ASE, pages 148–157, 2008.
    [37]
    P. Meredith and G. Ro¸su. Efficient parametric runtime verification with deterministic string rewriting. In ASE, pages 70–80, 2013.
    [38]
    S. Navabpour, C. W. W. Wu, B. Bonakdarpour, and S. Fischmeister. Efficient techniques for near-optimal instrumentation in time-triggered runtime verification. In RV, pages 208–222, 2011.
    [39]
    A. C. Nguyen and S.-C. Khoo. Extracting significant specifications from mining through mutation testing. In ICFEM, pages 472–488, 2011.
    [40]
    H. A. Nguyen, R. Dyer, T. N. Nguyen, and H. Rajan. Mining preconditions of APIs in large-scale code corpus. In FSE, pages 166–177, 2014.
    [41]
    java.lang.instrument. http://docs.oracle.com/javase/ 7/docs/api/java/lang/instrument/package-summary. html.
    [42]
    C. Pacheco and M. D. Ernst. Randoop: Feedback-directed random testing for Java. In OOPSLA Companion, pages 815–816, 2007.
    [43]
    C. Pacheco, S. K. Lahiri, and T. Ball. Finding errors in .NET with feedback-directed random testing. In ISSTA, pages 87–96, 2008.
    [44]
    C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In ICSE, pages 75–84, 2007.
    [45]
    M. Pradel. Dynamically inferring, refining, and checking API usage protocols. In OOPSLA Companion, pages 773–774, 2009.
    [46]
    M. Pradel, P. Bichsel, and T. R. Gross. A framework for the evaluation of specification miners based on finite state machines. In ICSM, pages 1–10, 2010.
    [47]
    M. Pradel and T. R. Gross. Automatic generation of object usage specifications from large method traces. In ASE, pages 371–382, 2009.
    [48]
    M. Pradel and T. R. Gross. Leveraging test generation and specification mining for automated bug detection without false positives. In ICSE, pages 288–298, 2012.
    [49]
    M. Pradel, C. Jaspan, J. Aldrich, and T. R. Gross. Statically checking API protocol conformance with mined multi-object specifications. In ICSE, pages 925–935, 2012.
    [50]
    Statically checking API protocol conformance with mined multi-object specifications (supplementary material). http://mp.binaervarianz.de/ icse2012-statically/.
    [51]
    FSL Specification Database. https:// runtimeverification.com/monitor/propertydb.
    [52]
    R. Purandare, M. B. Dwyer, and S. Elbaum. Optimizing monitoring of finite state properties through monitor compaction. In ISSTA, pages 280–290, 2013.
    [53]
    Randoop. https://randoop.github.io/randoop/.
    [54]
    G. Reger, H. Barringer, and D. Rydeheard. A pattern-based approach to parametric specification mining. In ASE, pages 658–663, 2013.
    [55]
    M. P. Robillard, E. Bodden, D. Kawrykow, M. Mezini, and T. Ratchford. Automated API property inference techniques. TSE, 39(5):613–637, 2013.
    [56]
    S. Shamshiri, R. Just, J. Rojas, G. Fraser, P. McMinn, and A. Arcuri. Do automatically generated unit tests find real faults? An empirical study of effectiveness and challenges. In ASE, pages 201–211, 2015.
    [57]
    Supplementary material for this paper. http://fsl.cs. illinois.edu/spec-eval.
    [58]
    J. Sun, H. Xiao, Y. Liu, S.-W. Lin, and S. Qin. TLV: Abstraction through testing, learning, and validation. In ESEC/FSE, pages 698–709, 2015.
    [59]
    S. H. Tan, D. Marinov, L. Tan, and G. T. Leavens. @tComment: Testing Javadoc comments to detect comment-code inconsistencies. In ICST, pages 260–269, 2012.
    [60]
    A. Wasylkowski and A. Zeller. Mining temporal specifications from object usage. In ASE, pages 295–306, 2009.
    [61]
    C. W. W. Wu, D. Kumar, B. Bonakdarpour, and S. Fischmeister. Reducing monitoring overhead by integrating event- and time-triggered techniques. In RV, pages 304–321, 2013.
    [62]
    Q. Wu, G. Liang, Q. Wang, T. Xie, and H. Mei. Iterative mining of resource-releasing specifications. In ASE, pages 233–242, 2011.
    [63]
    H. Zhong, L. Zhang, T. Xie, and H. Mei. Inferring resource specifications from natural language API documentation. In ASE, pages 307–318, 2009.

    Cited By

    View all
    • (2024)ProveNFix: Temporal Property-Guided Program RepairProceedings of the ACM on Software Engineering10.1145/36437371:FSE(226-248)Online publication date: 12-Jul-2024
    • (2024)Predictive Monitoring against Pattern Regular LanguagesProceedings of the ACM on Programming Languages10.1145/36329158:POPL(2191-2225)Online publication date: 5-Jan-2024
    • (2024)Scaling Code Pattern Inference with Interactive What-If AnalysisProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639193(1-12)Online publication date: 20-May-2024
    • Show More Cited By

    Index Terms

    1. How good are the specs? a study of the bug-finding effectiveness of existing Java API specifications

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering
      August 2016
      899 pages
      ISBN:9781450338455
      DOI:10.1145/2970276
      • General Chair:
      • David Lo,
      • Program Chairs:
      • Sven Apel,
      • Sarfraz Khurshid
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 August 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. empirical study
      2. runtime verification
      3. specification quality

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ASE'16
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 82 of 337 submissions, 24%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)110
      • Downloads (Last 6 weeks)19
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)ProveNFix: Temporal Property-Guided Program RepairProceedings of the ACM on Software Engineering10.1145/36437371:FSE(226-248)Online publication date: 12-Jul-2024
      • (2024)Predictive Monitoring against Pattern Regular LanguagesProceedings of the ACM on Programming Languages10.1145/36329158:POPL(2191-2225)Online publication date: 5-Jan-2024
      • (2024)Scaling Code Pattern Inference with Interactive What-If AnalysisProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639193(1-12)Online publication date: 20-May-2024
      • (2024)Programming Assistant for Exception Handling with CodeBERTProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639188(1-13)Online publication date: 20-May-2024
      • (2024)Predictive Monitoring with Strong Trace PrefixesComputer Aided Verification10.1007/978-3-031-65630-9_9(182-204)Online publication date: 25-Jul-2024
      • (2023)Poracle: Testing Patches under Preservation Conditions to Combat the Overfitting Problem of Program RepairACM Transactions on Software Engineering and Methodology10.1145/362529333:2(1-39)Online publication date: 26-Sep-2023
      • (2023)Fail through the Cracks: Cross-System Interaction Failures in Modern Cloud SystemsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587448(433-451)Online publication date: 8-May-2023
      • (2023)Runtime Verification of Crypto APIs: An Empirical StudyIEEE Transactions on Software Engineering10.1109/TSE.2023.330166049:10(4510-4525)Online publication date: 1-Oct-2023
      • (2023)How Dynamic Features Affect API Usages? An Empirical Study of API Misuses in Python Programs2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00055(522-533)Online publication date: Mar-2023
      • (2023)APICAD: Augmenting API Misuse Detection through Specifications from Code and Documents2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00032(245-256)Online publication date: May-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media