Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3597503.3639135acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Streamlining Java Programming: Uncovering Well-Formed Idioms with IdioMine

Published: 12 April 2024 Publication History

Abstract

Code idioms are commonly used patterns, techniques, or practices that aid in solving particular problems or specific tasks across multiple software projects. They can improve code quality, performance, and maintainability, and also promote program standardization and reuse across projects. However, identifying code idioms is significantly challenging, as existing studies have still suffered from three main limitations. First, it is difficult to recognize idioms that span non-contiguous code lines. Second, identifying idioms with intricate data flow and code structures can be challenging. Moreover, they only extract dataset-specific idioms, so common idioms or well-established code/design patterns that are rarely found in datasets cannot be identified.
To overcome these limitations, we propose a novel approach, named IdioMine, to automatically extract generic and specific idioms from both Java projects and libraries. We perform program analysis on Java functions to transform them into concise PDGs, for integrating the data flow and control flow of code fragments. We then develop a novel chain structure, Data-driven Control Chain (DCC), to extract sub-idioms that possess contiguous semantic meanings from PDGs. After that, we utilize GraphCodeBERT to generate code embeddings of these sub-idioms and perform density-based clustering to obtain frequent sub-idioms. We use heuristic rules to identify interrelated sub-idioms among the frequent ones. Finally, we employ ChatGPT to synthesize interrelated sub-idioms into potential code idioms and infer real idioms from them.
We conduct well-designed experiments and a user study to evaluate IdioMine's correctness and the practical value of the extracted idioms. Our experimental results show that IdioMine effectively extracts more idioms with better performance in most metrics. We compare our approach with Haggis and ChatGPT, IdioMine outperforms them by 22.8% and 35.5% in Idiom Set Precision (ISP) and by 9.7% and 22.9% in Idiom Coverage (IC) when extracting idioms from libraries. IdioMine also extracts almost twice the size of idioms than the baselines, exhibiting its ability to identify complete idioms. Our user study indicates that idioms extracted by IdioMine are well-formed and semantically clear. Moreover, we conduct a qualitative and quantitative analysis to investigate the primary functionalities of IdioMine's extracted idioms from various projects and libraries.

References

[1]
2023. https://en.wikipedia.org/wiki/Programming_idiom.
[2]
2023. https://en.wikipedia.org/wiki/Programming_idiom.
[3]
2023. https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms.
[4]
2023. http://c2.com/ppr/wiki/JavaIdioms/JavaIdioms.html.
[5]
2023. http://shichuan.github.io/javascript-patterns/.
[6]
2023. https://github.com/rwaldron/idiomatic.js/.
[7]
2023. https://github.com/jeffkit/rabbitmq-benchmark/blob/Producer.java#L53.
[8]
2023. http://www.java2s.com/Open-Source/Java_Free_Code/Server/Download_Apposite_Repository_Server_Free_Java_Code.htm.
[9]
2023. https://github.com/structr/structr/blob/0.4.9/structr/structr-core/src/main/java/org/structr/core/entity/RelationClass.java#L330.
[10]
2023. https://structr.com/#start.
[11]
2023. https://en.wikipedia.org/wiki/Command_pattern.
[12]
2023. https://github.com/Yanming-Yang/idioMine.
[13]
2023. https://github.com/c2nes/javalang.
[14]
2023. https://en.wikipedia.org/wiki/Hyperparameter_optimization).
[15]
2023. https://en.wikipedia.org/wiki/Silhouette_(clustering).
[16]
2023. https://openai.com/blog/chatgpt.
[17]
2023. https://openai.com/.
[18]
2023. https://openai.com/blog/introducing-chatgpt-and-whisper-apis.
[19]
2023. https://pytorch.org/.
[20]
2023. https://en.wikipedia.org/wiki/Dispose_pattern.
[21]
2023. https://en.wikipedia.org/wiki/Observer_pattern.
[22]
2023. https://java-design-patterns.com/patterns/producer-consumer/.
[23]
2023. https://en.wikipedia.org/wiki/Builder_pattern.
[24]
2023. https://en.wikipedia.org/wiki/Lazy_initialization.
[25]
2023. https://learn.microsoft.com/en-us/power-query/wait-retry.
[26]
2023. https://en.wikipedia.org/wiki/Double-checked_locking.
[27]
2023. https://en.wikipedia.org/wiki/Factory_method_pattern.
[28]
2023. https://www.javacodegeeks.com/2013/09/android-viewholder-pattern-example.html.
[29]
2023. https://en.wikipedia.org/wiki/Dynamic_dispatch.
[30]
2023. https://en.wikipedia.org/wiki/Random_access.
[31]
2023. Hack is a programming language for the HipHop Virtual Machine, created by Facebook as a dialect of PHP. https://hacklang.org/.
[32]
2023. Use-define chain. https://en.wikipedia.org/wiki/Use-define_chain.
[33]
Shulamyt Ajami, Yonatan Woodbridge, and Dror G Feitelson. 2019. Syntax, predicates, idioms---what really affects code complexity? Empirical Software Engineering 24 (2019), 287--328.
[34]
Miltiadis Allamanis, Earl T Barr, Christian Bird, Premkumar Devanbu, Mark Marron, and Charles Sutton. 2018. Mining semantic loop idioms. IEEE Transactions on Software Engineering 44, 7 (2018), 651--668.
[35]
Miltiadis Allamanis and Charles Sutton. 2013. Mining source code repositories at massive scale using language modeling. In 2013 10th working conference on mining software repositories (MSR). IEEE, 207--216.
[36]
Miltiadis Allamanis and Charles Sutton. 2014. Mining idioms from source code. In Proceedings of the 22nd acm sigsoft international symposium on foundations of software engineering. 472--483.
[37]
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. 2023. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
[38]
Frank J. Budinsky, Marilyn A. Finnie, John M. Vlissides, and Patsy S. Yu. 1996. Automatic code generation from design patterns. IBM systems Journal 35, 2 (1996), 151--171.
[39]
Luiz Laerte Nunes da Silva Junior, Troy Costa Kohwalter, Alexandre Plastino, and Leonardo Gresta Paulino Murta. 2021. Sequential coding patterns: How to use them effectively in code recommendation. Information and Software Technology 140 (2021), 106690.
[40]
Dario Di Nucci, Hoang-Son Pham, Johan Fabry, Coen De Roover, Kim Mens, Tim Molderez, Siegfried Nijssen, and Vadim Zaytsev. 2019. A Language-Parametric Modular Framework for Mining Idiomatic Code Patterns. In SATToSE.
[41]
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, and Zhifang Sui. 2022. A survey for in-context learning. arXiv preprint arXiv:2301.00234 (2022).
[42]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, Vol. 96. 226--231.
[43]
Joseph Gil and Itay Maman. 2005. Micro patterns in Java code. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications. 97--116.
[44]
Boryana Goncharenko and Vadim Zaytsev. 2016. Language design and implementation for the domain of coding conventions. In Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering. 90--104.
[45]
Eduardo Guerra, Menanes Cardoso, Jefferson Silva, and Clovis Fernandes. 2010. Idioms for code annotations in the java language. In Proceedings of the 8th Latin American Conference on Pattern Languages of Programs. 1--14.
[46]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).
[47]
Arno Haase. 2001. Java Idioms: Code Blocks and Control Flow. In EuroPLoP. 227--250.
[48]
Bogumiła Hnatkowska and Anna Jaszczak. 2014. Impact of selected java idioms on source code maintainability-empirical study. In Proceedings of the Ninth International Conference on Dependability and Complex Systems DepCoS-RELCOMEX. June 30--July 4, 2014, Brunów, Poland. Springer, 243--254.
[49]
Andrew Hunt. 1900. The pragmatic programmer. Pearson Education India.
[50]
Srinivasan Iyer, Alvin Cheung, and Luke Zettlemoyer. 2019. Learning programmatic idioms for scalable semantic parsing. arXiv preprint arXiv:1904.09086 (2019).
[51]
Christos Kartsaklis, Oscar Hernandez, Chung-Hsing Hsu, Thomas Ilsche, Wayne Joubert, and Richard L Graham. 2012. HERCULES: A pattern driven code transformation system. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE, 574--583.
[52]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1--35.
[53]
Xingyu Long, Peeratham Techapalokul, and Eli Tilevich. 2021. The common coder's scratch programming idioms and their impact on project remixing. In Proceedings of the 2021 ACM SIGPLAN International Symposium on SPLASH-E. 1--12.
[54]
Robert C Martin. 2009. Clean code: a handbook of agile software craftsmanship. Pearson Education.
[55]
Steve McConnell. 2004. Code complete. Pearson Education.
[56]
Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica 22, 3 (2012), 276--282.
[57]
José J Merchante and Gregorio Robles. 2017. From Python to Pythonic: Searching for Python idioms in GitHub. In Seminar Series on Advanced Techniques and Tools for Software Evolution (SATToSE).
[58]
Scott Meyers. 2005. Effective C++: 55 specific ways to improve your programs and designs. Pearson Education.
[59]
Sebastian Nielebock, Robert Heumüller, Kevin Michael Schott, and Frank Ortmeier. 2021. Guided pattern mining for API misuse detection by change-based code analysis. Automated Software Engineering 28, 2 (2021), 15.
[60]
Dmitry Orlov. 2020. Finding idioms in source code using subtree counting techniques. In Leveraging Applications of Formal Methods, Verification and Validation: Engineering Principles: 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Rhodes, Greece, October 20--30, 2020, Proceedings, Part II 9. Springer, 44--54.
[61]
Purit Phan-Udom, Naruedon Wattanakul, Tattiya Sakulniwat, Chaiyong Ragkhitwetsagul, Thanwadee Sunetnanta, Morakot Choetkiertikul, and Raula Gaikovina Kula. 2020. Teddy: automatic recommendation of pythonic idiom usage for pull-based software projects. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 806--809.
[62]
Mukund Raghothaman, Yi Wei, and Youssef Hamadi. 2016. SWIM: synthesizing what I mean: code search and idiomatic snippet synthesis. In Proceedings of the 38th International Conference on Software Engineering. 357--367.
[63]
Eui Chul Shin, Miltiadis Allamanis, Marc Brockschmidt, and Alex Polozov. 2019. Program synthesis and semantic parsing with learned code idioms. Advances in Neural Information Processing Systems 32 (2019).
[64]
Aishwarya Sivaraman, Rui Abreu, Andrew Scott, Tobi Akomolede, and Satish Chandra. 2022. Mining idioms in the wild. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice. 187--196.
[65]
Lawrence Snyder. 1982. Recognition and selection of idioms for code optimization. Acta Informatica 17, 3 (1982), 327--348.
[66]
Miguel Terra-Neves, João Nadkarni, Miguel Ventura, Pedro Resende, Hugo Veiga, and António Alegria. 2021. Duplicated code pattern mining in visual programming languages. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1348--1359.
[67]
Yilin Yang, Tianxing He, Yang Feng, Shaoying Liu, and Baowen Xu. 2022. Mining Python fix patterns via analyzing fine-grained source code changes. Empirical Software Engineering 27, 2 (2022), 48.
[68]
Yanming Yang, Xin Xia, David Lo, Tingting Bi, John Grundy, and Xiaohu Yang. 2022. Predictive models in software engineering: Challenges and opportunities. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 3 (2022), 1--72.
[69]
Yanming Yang, Xin Xia, David Lo, and John Grundy. 2022. A survey on deep learning for software engineering. ACM Computing Surveys (CSUR) 54, 10s (2022), 1--73.

Index Terms

  1. Streamlining Java Programming: Uncovering Well-Formed Idioms with IdioMine

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
    May 2024
    2942 pages
    ISBN:9798400702174
    DOI:10.1145/3597503
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • Faculty of Engineering of University of Porto

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 April 2024

    Check for updates

    Author Tags

    1. code idiom mining
    2. code pattern
    3. large language model (LLM)
    4. clustering

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICSE '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 74
      Total Downloads
    • Downloads (Last 12 months)74
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 06 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media