research-article

Mining billions of AST nodes to study actual and potential usage of Java language features

Authors:

Hoan Anh Nguyen, and

Tien N. NguyenAuthors Info & Claims

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

May 2014

Pages 779 - 790

https://doi.org/10.1145/2568225.2568295

Published: 31 May 2014 Publication History

Abstract

Programming languages evolve over time, adding additional language features to simplify common tasks and make the language easier to use. For example, the Java Language Specification has four editions and is currently drafting a fifth. While the addition of language features is driven by an assumed need by the community (often with direct requests for such features), there is little empirical evidence demonstrating how these new features are adopted by developers once released. In this paper, we analyze over 31k open-source Java projects representing over 9 million Java files, which when parsed contain over 18 billion AST nodes. We analyze this corpus to find uses of new Java language features over time. Our study gives interesting insights, such as: there are millions of places features could potentially be used but weren't; developers convert existing code to use new features; and we found thousands of instances of potential resource handling bugs.

References

[1]

Eclipse. http://www.eclipse.org/, 2014.

[2]

Eclipse Java development tools (JDT). http://www.eclipse.org/jdt/overview.php, 2014.

[3]

Netbeans. http://www.netbeans.org/, 2014.

[4]

Netbeans inspect and transform. https://netbeans.org/kb/docs/java/ editor-inspect-transform.html#convert, 2014.

[5]

Apache Software Foundation. Hadoop: Open source implementation of MapReduce. http://hadoop.apache.org/, 2014.

[6]

P. F. Baldi, C. V. Lopes, E. J. Linstead, and S. K. Bajracharya. A theory of aspects as latent topics. In Proceedings of the 23rd ACM SIGPLAN conference on Object-Oriented Programming Systems Languages and Applications, OOPSLA, pages 543–562, 2008.

Digital Library

[7]

H. A. Basit, D. C. Rajapakse, and S. Jarzabek. An empirical study on limits of clone unification using generics. In Proceedings of the 17th International Conference on Software Engineering and Knowledge Engineering, SEKE, pages 109–114, 2005.

[8]

G. Bracha, M. Odersky, D. Stoutamire, and P. Wadler. Making the future safe for the past: adding genericity to the Java programming language. SIGPLAN Not., 33(10), Oct. 1998.

Digital Library

[9]

O. Callaú, R. Robbes, E. Tanter, and D. Röthlisberger. How developers use the dynamic features of programming languages: the case of Smalltalk. In Proceedings of the 8th Working Conference on Mining Software Repositories, MSR, pages 23–32, 2011.

Digital Library

[10]

A. S. Christensen, A. Møller, and M. I. Schwartzbach. Precise analysis of string expressions. In Proceedings of the 10th international conference on Static Analysis, SAS, pages 1–18, 2003.

Digital Library

[11]

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation, OSDI, 2004.

Digital Library

[12]

Dice Holdings, Inc. Sourceforge website. http://sourceforge.net/, 2014.

[13]

E. Duala-Ekoko and M. P. Robillard. Using structure-based recommendations to facilitate discoverability in APIs. In Proceedings of the 25th European conference on Object-oriented programming, ECOOP, pages 79–104, 2011.

Digital Library

[14]

R. Dyer, H. Nguyen, H. Rajan, and T. N. Nguyen. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In Proceedings of the 35th ACM/IEEE International Conference on Software Engineering, ICSE, pages 422–431, 2013.

Digital Library

[15]

R. Dyer, H. Rajan, and T. N. Nguyen. Declarative visitors to ease fine-grained source code mining with full history on billions of AST nodes. In Proceedings of the 12th International Conference on Generative Programming: Concepts & Experiences, GPCE, 2013.

Digital Library

[16]

T. Gorschek, E. Tempero, and L. Angelis. A large-scale empirical study of practitioners’ use of object-oriented concepts. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, ICSE, pages 115–124, 2010.

Digital Library

[17]

J. Gosling, B. Joy, and G. Steele. Java(TM) Language Specification. Addison-Wesley Longman Publishing Co., Inc., 1st edition, 1996.

Digital Library

[18]

J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification. Addison-Wesley Longman Publishing Co., Inc., 2nd edition, 2000.

Digital Library

[19]

J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification. Addison-Wesley Professional, 3rd edition, 2005.

Digital Library

[20]

J. Gosling, B. Joy, G. Steele, G. Bracha, and A. Buckley. Java(TM) Language Specification. Prentice Hall, Java SE 7 edition, 2013.

Digital Library

[21]

M. Grechanik, C. McMillan, L. DeFerrari, M. Comi, S. Crespi, D. Poshyvanyk, C. Fu, Q. Xie, and C. Ghezzi. An empirical investigation into a large-scale Java open source code repository. In International Symposium on Empirical Software Engineering and Measurement, ESEM, pages 11:1–11:10, 2010.

Digital Library

[22]

M. Hoppe and S. Hanenberg. Do developers benefit from generic types? An empirical comparison of generic and raw types in Java. In 4th ACM SIGPLAN conference on Systems, Programming, Languages and Applications: Software for Humanity, SPLASH, 2013.

Digital Library

[23]

E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18, 2009.

Digital Library

[24]

B. Livshits, J. Whaley, and M. S. Lam. Reflection analysis for Java. In Proceedings of the Third Asian conference on Programming Languages and Systems, APLAS, pages 139–160, 2005.

Digital Library

[25]

L. Meyerovich and A. Rabkin. Empirical analysis of programming language adoption. In 4th ACM SIGPLAN conference on Systems, Programming, Languages and Applications: Software for Humanity, SPLASH, 2013.

Digital Library

[26]

R. Muschevici, A. Potanin, E. Tempero, and J. Noble. Multiple dispatch in practice. In Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, OOPSLA, pages 563–582, 2008.

Digital Library

[27]

C. Parnin, C. Bird, and E. R. Murphy-Hill. Java generics adoption: how new features are introduced, championed, or ignored. In 8th IEEE International Working Conference on Mining Software Repositories, MSR, 2011.

Digital Library

[28]

H. Rajan, T. N. Nguyen, R. Dyer, and H. A. Nguyen. Boa website. http://boa.cs.iastate.edu/, 2014.

[29]

P. Ratanaworabhan, B. Livshits, and B. G. Zorn. Jsmeter: comparing the behavior of JavaScript benchmarks with real web applications. In Proceedings of the 2010 USENIX conference on Web application development, WebApps, 2010.

Digital Library

[30]

P. Resnick and H. R. Varian. Recommender systems. Commun. ACM, 40(3):56–58, 1997.

Digital Library

[31]

G. Richards, C. Hammer, B. Burg, and J. Vitek. The eval that men do: A large-scale study of the use of eval in JavaScript applications. In Proceedings of the 25th European conference on Object-oriented programming, ECOOP, pages 52–78, 2011.

Digital Library

[32]

G. Richards, S. Lebresne, B. Burg, and J. Vitek. An analysis of the dynamic behavior of JavaScript programs. In Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation, PLDI, 2010.

Digital Library

[33]

M. Robillard, R. Walker, and T. Zimmermann. Recommendation systems for software engineering. IEEE Software, 27(4):80–86, 2010.

Digital Library

[34]

S. R. Schach. Object-oriented and Classical Software Engineering. McGraw-Hill Higher Education. McGraw-Hill Higher Education, 2005.

Digital Library

[35]

E. Tempero. How fields are used in Java: An empirical study. In Proceedings of the 20th Australian Software Engineering Conference, ASWEC, pages 91–100, 2009.

Digital Library

[36]

E. Tempero, J. Noble, and H. Melton. How do Java programs use inheritance? An empirical study of inheritance in Java software. In Proceedings of the 22nd European conference on Object-Oriented Programming, ECOOP, pages 667–691, 2008.

Digital Library

[37]

W. Weimer and G. C. Necula. Finding and preventing run-time error handling mistakes. In Proceedings of the 19th ACM SIGPLAN conference on Object-oriented programming systems languages and applications, OOPSLA, pages 419–431, 2004.

Digital Library

[38]

C. Yue and H. Wang. Characterizing insecure JavaScript practices on the web. In Proceedings of the 18th international conference on World Wide Web, WWW, pages 961–970, 2009.

Digital Library

Cited By

Sihler FPietzschmann LStraub RTichy MDiera ADahou ASpinellis DConstantinou EBacchelli A(2024)On the Anatomy of Real-World R Code for Static AnalysisProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644911(619-630)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644911
Dilhara MBellur ABryksin TDig D(2024)Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by ExampleProceedings of the ACM on Software Engineering10.1145/36437551:FSE(631-653)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643755
OBrien DDyer RNguyen TRajan HRoychoudhury APaiva AAbreu RStorey M(2024)Data-Driven Evidence-Based Syntactic Sugar DesignProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639580(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639580
Show More Cited By

Index Terms

Mining billions of AST nodes to study actual and potential usage of Java language features
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features

Recommendations

Understanding the API usage in Java

ContextApplication Programming Interfaces (APIs) facilitate the use of programming languages. They define sets of rules and specifications for software programs to interact with. The design of language API is usually artistic, driven by aesthetic ...
Read More
Evaluating the Java Native Interface JNI: Leveraging Existing Native Code, Libraries and Threads to a Running Java Virtual Machine

This article aims to explore JNI features and to discover fundamental operations of the Java programming language, such as arrays, objects, classes, threads and exception handling, and to illustrate these by using various algorithms and code samples. ...
Read More
A study of Java's non-Java memory
OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

A Java application sometimes raises an out-of-memory ex-ception. This is usually because it has exhausted the Java heap. However, a Java application can raise an out-of-memory exception when it exhausts the memory used by Java that is not in the Java ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

May 2014

1139 pages

ISBN:9781450327565

DOI:10.1145/2568225

General Chair:
Pankaj Jalote
IIIT-Delhi, India
,
Program Chairs:
Lionel Briand
University of Luxembourg, Luxembourg
,
André van der Hoek
University of California, Irvine, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '14

Sponsor:

SIGSOFT

ICSE '14: 36th International Conference on Software Engineering

May 31 - June 7, 2014

Hyderabad, India

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

76
Total Citations
View Citations
774
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)7

Other Metrics

View Author Metrics

Citations

Cited By

Sihler FPietzschmann LStraub RTichy MDiera ADahou ASpinellis DConstantinou EBacchelli A(2024)On the Anatomy of Real-World R Code for Static AnalysisProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644911(619-630)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644911
Dilhara MBellur ABryksin TDig D(2024)Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by ExampleProceedings of the ACM on Software Engineering10.1145/36437551:FSE(631-653)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643755
OBrien DDyer RNguyen TRajan HRoychoudhury APaiva AAbreu RStorey M(2024)Data-Driven Evidence-Based Syntactic Sugar DesignProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639580(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639580
Li HTan TLi YLu JMeng HCao LHuang YLi LGao LDi PLin LCui C(2024)Generic Sensitivity: Generics-Guided Context Sensitivity for Pointer AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2024.337764550:5(1144-1162)Online publication date: May-2024
https://doi.org/10.1109/TSE.2024.3377645
Admiraal Cvan den Brink WGerhold MZaytsev VZubcu C(2024)Deriving modernity signatures of codebases with static analysisJournal of Systems and Software10.1016/j.jss.2024.111973211:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.111973
Dong YLi ZTian YSun CGodfrey MNagappan M(2023)Bash in the Wild: Language Usage, Code Smells, and BugsACM Transactions on Software Engineering and Methodology10.1145/351719332:1(1-22)Online publication date: 13-Feb-2023
https://dl.acm.org/doi/10.1145/3517193
Yu ZMartinez MChen ZBissyandé TMonperrus M(2023)Learning the Relation Between Code Features and Code Transforms With Structured PredictionIEEE Transactions on Software Engineering10.1109/TSE.2023.327538049:7(3872-3900)Online publication date: Jul-2023
https://doi.org/10.1109/TSE.2023.3275380
Keshk ADyer R(2023)Method Chaining Redux: An Empirical Study of Method Chaining in Java, Kotlin, and Python2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00080(546-557)Online publication date: May-2023
https://doi.org/10.1109/MSR59073.2023.00080
Scarsbrook JUtting MKo R(2023)TypeScript’s Evolution: An Analysis of Feature Adoption Over Time2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00027(109-114)Online publication date: May-2023
https://doi.org/10.1109/MSR59073.2023.00027
Li HLu JMeng HCao LHuang YLi LGao LRoychoudhury ACadar CKim M(2022)Generic sensitivity: customizing context-sensitive pointer analysis for genericsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549122(1110-1121)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3549122
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents