Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2568225.2568295acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Mining billions of AST nodes to study actual and potential usage of Java language features

Published: 31 May 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Programming languages evolve over time, adding additional language features to simplify common tasks and make the language easier to use. For example, the Java Language Specification has four editions and is currently drafting a fifth. While the addition of language features is driven by an assumed need by the community (often with direct requests for such features), there is little empirical evidence demonstrating how these new features are adopted by developers once released. In this paper, we analyze over 31k open-source Java projects representing over 9 million Java files, which when parsed contain over 18 billion AST nodes. We analyze this corpus to find uses of new Java language features over time. Our study gives interesting insights, such as: there are millions of places features could potentially be used but weren't; developers convert existing code to use new features; and we found thousands of instances of potential resource handling bugs.

    References

    [1]
    Eclipse. http://www.eclipse.org/, 2014.
    [2]
    Eclipse Java development tools (JDT). http://www.eclipse.org/jdt/overview.php, 2014.
    [3]
    Netbeans. http://www.netbeans.org/, 2014.
    [4]
    Netbeans inspect and transform. https://netbeans.org/kb/docs/java/ editor-inspect-transform.html#convert, 2014.
    [5]
    Apache Software Foundation. Hadoop: Open source implementation of MapReduce. http://hadoop.apache.org/, 2014.
    [6]
    P. F. Baldi, C. V. Lopes, E. J. Linstead, and S. K. Bajracharya. A theory of aspects as latent topics. In Proceedings of the 23rd ACM SIGPLAN conference on Object-Oriented Programming Systems Languages and Applications, OOPSLA, pages 543–562, 2008.
    [7]
    H. A. Basit, D. C. Rajapakse, and S. Jarzabek. An empirical study on limits of clone unification using generics. In Proceedings of the 17th International Conference on Software Engineering and Knowledge Engineering, SEKE, pages 109–114, 2005.
    [8]
    G. Bracha, M. Odersky, D. Stoutamire, and P. Wadler. Making the future safe for the past: adding genericity to the Java programming language. SIGPLAN Not., 33(10), Oct. 1998.
    [9]
    O. Callaú, R. Robbes, E. Tanter, and D. Röthlisberger. How developers use the dynamic features of programming languages: the case of Smalltalk. In Proceedings of the 8th Working Conference on Mining Software Repositories, MSR, pages 23–32, 2011.
    [10]
    A. S. Christensen, A. Møller, and M. I. Schwartzbach. Precise analysis of string expressions. In Proceedings of the 10th international conference on Static Analysis, SAS, pages 1–18, 2003.
    [11]
    J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation, OSDI, 2004.
    [12]
    Dice Holdings, Inc. Sourceforge website. http://sourceforge.net/, 2014.
    [13]
    E. Duala-Ekoko and M. P. Robillard. Using structure-based recommendations to facilitate discoverability in APIs. In Proceedings of the 25th European conference on Object-oriented programming, ECOOP, pages 79–104, 2011.
    [14]
    R. Dyer, H. Nguyen, H. Rajan, and T. N. Nguyen. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In Proceedings of the 35th ACM/IEEE International Conference on Software Engineering, ICSE, pages 422–431, 2013.
    [15]
    R. Dyer, H. Rajan, and T. N. Nguyen. Declarative visitors to ease fine-grained source code mining with full history on billions of AST nodes. In Proceedings of the 12th International Conference on Generative Programming: Concepts & Experiences, GPCE, 2013.
    [16]
    T. Gorschek, E. Tempero, and L. Angelis. A large-scale empirical study of practitioners’ use of object-oriented concepts. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, ICSE, pages 115–124, 2010.
    [17]
    J. Gosling, B. Joy, and G. Steele. Java(TM) Language Specification. Addison-Wesley Longman Publishing Co., Inc., 1st edition, 1996.
    [18]
    J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification. Addison-Wesley Longman Publishing Co., Inc., 2nd edition, 2000.
    [19]
    J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification. Addison-Wesley Professional, 3rd edition, 2005.
    [20]
    J. Gosling, B. Joy, G. Steele, G. Bracha, and A. Buckley. Java(TM) Language Specification. Prentice Hall, Java SE 7 edition, 2013.
    [21]
    M. Grechanik, C. McMillan, L. DeFerrari, M. Comi, S. Crespi, D. Poshyvanyk, C. Fu, Q. Xie, and C. Ghezzi. An empirical investigation into a large-scale Java open source code repository. In International Symposium on Empirical Software Engineering and Measurement, ESEM, pages 11:1–11:10, 2010.
    [22]
    M. Hoppe and S. Hanenberg. Do developers benefit from generic types? An empirical comparison of generic and raw types in Java. In 4th ACM SIGPLAN conference on Systems, Programming, Languages and Applications: Software for Humanity, SPLASH, 2013.
    [23]
    E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18, 2009.
    [24]
    B. Livshits, J. Whaley, and M. S. Lam. Reflection analysis for Java. In Proceedings of the Third Asian conference on Programming Languages and Systems, APLAS, pages 139–160, 2005.
    [25]
    L. Meyerovich and A. Rabkin. Empirical analysis of programming language adoption. In 4th ACM SIGPLAN conference on Systems, Programming, Languages and Applications: Software for Humanity, SPLASH, 2013.
    [26]
    R. Muschevici, A. Potanin, E. Tempero, and J. Noble. Multiple dispatch in practice. In Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, OOPSLA, pages 563–582, 2008.
    [27]
    C. Parnin, C. Bird, and E. R. Murphy-Hill. Java generics adoption: how new features are introduced, championed, or ignored. In 8th IEEE International Working Conference on Mining Software Repositories, MSR, 2011.
    [28]
    H. Rajan, T. N. Nguyen, R. Dyer, and H. A. Nguyen. Boa website. http://boa.cs.iastate.edu/, 2014.
    [29]
    P. Ratanaworabhan, B. Livshits, and B. G. Zorn. Jsmeter: comparing the behavior of JavaScript benchmarks with real web applications. In Proceedings of the 2010 USENIX conference on Web application development, WebApps, 2010.
    [30]
    P. Resnick and H. R. Varian. Recommender systems. Commun. ACM, 40(3):56–58, 1997.
    [31]
    G. Richards, C. Hammer, B. Burg, and J. Vitek. The eval that men do: A large-scale study of the use of eval in JavaScript applications. In Proceedings of the 25th European conference on Object-oriented programming, ECOOP, pages 52–78, 2011.
    [32]
    G. Richards, S. Lebresne, B. Burg, and J. Vitek. An analysis of the dynamic behavior of JavaScript programs. In Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation, PLDI, 2010.
    [33]
    M. Robillard, R. Walker, and T. Zimmermann. Recommendation systems for software engineering. IEEE Software, 27(4):80–86, 2010.
    [34]
    S. R. Schach. Object-oriented and Classical Software Engineering. McGraw-Hill Higher Education. McGraw-Hill Higher Education, 2005.
    [35]
    E. Tempero. How fields are used in Java: An empirical study. In Proceedings of the 20th Australian Software Engineering Conference, ASWEC, pages 91–100, 2009.
    [36]
    E. Tempero, J. Noble, and H. Melton. How do Java programs use inheritance? An empirical study of inheritance in Java software. In Proceedings of the 22nd European conference on Object-Oriented Programming, ECOOP, pages 667–691, 2008.
    [37]
    W. Weimer and G. C. Necula. Finding and preventing run-time error handling mistakes. In Proceedings of the 19th ACM SIGPLAN conference on Object-oriented programming systems languages and applications, OOPSLA, pages 419–431, 2004.
    [38]
    C. Yue and H. Wang. Characterizing insecure JavaScript practices on the web. In Proceedings of the 18th international conference on World Wide Web, WWW, pages 961–970, 2009.

    Cited By

    View all
    • (2024)On the Anatomy of Real-World R Code for Static AnalysisProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644911(619-630)Online publication date: 15-Apr-2024
    • (2024)Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by ExampleProceedings of the ACM on Software Engineering10.1145/36437551:FSE(631-653)Online publication date: 12-Jul-2024
    • (2024)Data-Driven Evidence-Based Syntactic Sugar DesignProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639580(1-12)Online publication date: 20-May-2024
    • Show More Cited By

    Index Terms

    1. Mining billions of AST nodes to study actual and potential usage of Java language features

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICSE 2014: Proceedings of the 36th International Conference on Software Engineering
      May 2014
      1139 pages
      ISBN:9781450327565
      DOI:10.1145/2568225
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      • TCSE: IEEE Computer Society's Tech. Council on Software Engin.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 May 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Java
      2. empirical study
      3. language feature use
      4. software mining

      Qualifiers

      • Research-article

      Conference

      ICSE '14
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 276 of 1,856 submissions, 15%

      Upcoming Conference

      ICSE 2025

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)60
      • Downloads (Last 6 weeks)7

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)On the Anatomy of Real-World R Code for Static AnalysisProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644911(619-630)Online publication date: 15-Apr-2024
      • (2024)Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by ExampleProceedings of the ACM on Software Engineering10.1145/36437551:FSE(631-653)Online publication date: 12-Jul-2024
      • (2024)Data-Driven Evidence-Based Syntactic Sugar DesignProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639580(1-12)Online publication date: 20-May-2024
      • (2024)Generic Sensitivity: Generics-Guided Context Sensitivity for Pointer AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2024.337764550:5(1144-1162)Online publication date: May-2024
      • (2024)Deriving modernity signatures of codebases with static analysisJournal of Systems and Software10.1016/j.jss.2024.111973211:COnline publication date: 2-Jul-2024
      • (2023)Bash in the Wild: Language Usage, Code Smells, and BugsACM Transactions on Software Engineering and Methodology10.1145/351719332:1(1-22)Online publication date: 13-Feb-2023
      • (2023)Learning the Relation Between Code Features and Code Transforms With Structured PredictionIEEE Transactions on Software Engineering10.1109/TSE.2023.327538049:7(3872-3900)Online publication date: Jul-2023
      • (2023)Method Chaining Redux: An Empirical Study of Method Chaining in Java, Kotlin, and Python2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00080(546-557)Online publication date: May-2023
      • (2023)TypeScript’s Evolution: An Analysis of Feature Adoption Over Time2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00027(109-114)Online publication date: May-2023
      • (2022)Generic sensitivity: customizing context-sensitive pointer analysis for genericsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549122(1110-1121)Online publication date: 7-Nov-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media