Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An Accurate Identifier Renaming Prediction and Suggestion Approach

Published: 29 September 2023 Publication History

Abstract

Identifiers play an important role in helping developers analyze and comprehend source code. However, many identifiers exist that are inconsistent with the corresponding code conventions or semantic functions, leading to flawed identifiers. Hence, identifiers need to be renamed regularly. Even though researchers have proposed several approaches to identify identifiers that need renaming and further suggest correct identifiers for them, these approaches only focus on a single or a limited number of granularities of identifiers without universally considering all the granularities and suggest a series of sub-tokens for composing identifiers without completely generating new identifiers. In this article, we propose a novel identifier renaming prediction and suggestion approach. Specifically, given a set of training source code, we first extract all the identifiers in multiple granularities. Then, we design and extract five groups of features from identifiers to capture inherent properties of identifiers themselves and the relationships between identifiers and code conventions, as well as other related code entities, enclosing files, and change history. By parsing the change history of identifiers, we can figure out whether specific identifiers have been renamed or not. These identifier features and their renaming history are used to train a Random Forest classifier, which can be further used to predict whether a given new identifier needs to be renamed or not. Subsequently, for the identifiers that need renaming, we extract all the related code entities and their renaming change history. Based on the intuition that identifiers are co-evolved as their relevant code entities with similar patterns and renaming sequences, we could suggest and recommend a series of new identifiers for those identifiers. We conduct extensive experiments to validate our approach in both the Java projects and the Android projects. Experimental results demonstrate that our approach could identify identifiers that need renaming with an average F-measure of more than 89%, which outperforms the state-of-the-art approach by 8.30% in the Java projects and 21.38% in the Android projects. In addition, our approach achieves a Hit@10 of 48.58% and 40.97% in the Java and Android projects in suggesting correct identifiers and outperforms the state-of-the-art approach by 29.62% and 15.75%, respectively.

References

[1]
Surafel Lemma Abebe and Paolo Tonella. 2013. Automated identifier completion and replacement. In Proceedings of the 17th European Conference on Software Maintenance and Reengineering (CSMR’13). 263–272.
[2]
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14). 281–293.
[3]
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (FSE’15). 38–49.
[4]
Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Survey 51, 4 (2018), Article 81, 37 pages.
[5]
Miltiadis Allamanis and Charles Sutton. 2013. Mining source code repositories at massive scale using language modeling. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR’13). 207–216.
[6]
Maurício Aniche, Erick Maziero, Rafael Durelli, and Vinicius H. S. Durelli. 2022. The effectiveness of supervised machine learning algorithms in predicting software refactoring. IEEE Transactions on Software Engineering 48, 4 (2022), 1432–1450.
[7]
Venera Arnaoudova, L. M. Eshkevari, M. D. Penta, Rocco Oliveto, Giuliano Antoniol, and Y. G. Gueheneuc. 2014. REPENT: Analyzing the nature of identifier renamings. IEEE Transactions on Software Engineering 40, 5 (2014), 502–532.
[8]
Lingfeng Bao, Xin Xia, David Lo, and Gail C. Murphy. 2021. A large scale study of long-time contributor prediction for GitHub projects. IEEE Transactions on Software Engineering 47, 6 (2021), 1277–1298.
[9]
Gabriele Bavota, Rocco Oliveto, Malcom Gethers, Denys Poshyvanyk, and Andrea De Lucia. 2014. Methodbook: Recommending move method refactorings via relational topic models. IEEE Transactions on Software Engineering 40, 7 (2014), 671–694.
[10]
Dave Binkley, Marcia Davis, Dawn Lawrie, Jonathan I. Maletic, Christopher Morrell, and Bonita Sharif. 2013. The impact of identifier style on effort and comprehension. Empirical Software Engineering 18, 2 (2013), 219–276.
[11]
Dave Binkley, Matthew Hearn, and Dawn Lawrie. 2011. Improving identifier informativeness using part of speech information. In Proceedings of the 8th Working Conference on Mining Software Repositories (MSR’11). 203–206.
[12]
Simon Butler. 2012. Mining Java class identifier naming conventions. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12). 1641–1643.
[13]
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2010. Exploring the influence of identifier names on code quality: An empirical study. In Proceedings of the 14th European Conference on Software Maintenance and Reengineering (CSMR’10). 156–165.
[14]
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2011. Mining Java class naming conventions. In Proceedings of the 27th International Conference on Software Maintenance (ICSM’11). 93–102.
[15]
Jürgen Börstler and Barbara Paech. 2016. The role of method chains and comments in software readability and comprehension—An experiment. IEEE Transactions on Software Engineering 42, 9 (2016), 886–898.
[16]
Yingkui Cao, Yanzhen Zou, Yuxiang Luo, Bing Xie, and Junfeng Zhao. 2018. Toward accurate link between code and software documentation. Science China Information Sciences 61, 5 (2018), 050105.
[17]
Nuno Ramos Carvalho, José João Almeida, Pedro Rangel Henriques, and Maria João Varanda. 2015. From source code identifiers to natural language terms. Journal of Systems and Software 100 (2015), 117–128.
[18]
Anna Corazza, Sergio Di Martino, and Valerio Maggio. 2013. LINSEN: An efficient approach to split identifiers and expand abbreviations. In Proceedings of the International Conference on Software Maintenance (ICSM’13). 233–242.
[19]
Andrea De Lucia, Massimiliano Di Penta, and Rocco Oliveto. 2011. Improving source code lexicon via traceability and information retrieval. IEEE Transactions on Software Engineering 37, 2 (2011), 205–227.
[20]
Florian Deissenboeck and Markus Pizka. 2015. Concise and consistent naming: Ten years later. In Proceedings of the 23rd International Conference on Program Comprehension (ICPC’15). 3.
[21]
Eric Enslen, Emily Hill, Lori Pollock, and K. Vijay-Shanker. 2009. Mining source code to automatically split identifiers for software analysis. In Proceedings of the IEEE International Working Conference on Mining Software Repositories (MSR’09). 71–80.
[22]
J.-R. Falleri, Marianne Huchard, Mathieu Lafourcade, Clémentine Nebut, Violaine Prince, and Michel Dao. 2010. Automatic extraction of a WordNet-like identifier network from software. In Proceedings of the International Conference on Program Comprehension (ICPC’10). 4–13.
[23]
Asger Feldthaus, Todd Millstein, Anders Møller, Max Schäfer, and Frank Tip. 2011. Tool-supported refactoring for JavaScript. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’11). 119–138.
[24]
Xi Ge, Quinton L. DuBose, and Emerson Murphy-Hill. 2012. Reconciling manual and automatic refactoring. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12). 211–221.
[25]
Latifa Guerrouj, Massimiliano Di Penta, Giuliano Antoniol, and Yann-Gael Gueheneuc. 2013. TIDIER: An identifier splitting approach using speech recognition techniques. Journal of Software: Evolution and Process 25, 6 (2013), 575–599.
[26]
Latifa Guerrouj, Yann Gaël Guéhéneuc, Giuliano Antoniol, and Massimiliano Di Penta. 2012. TRIS: A fast and accurate identifiers splitting and expansion algorithm. In Proceedings of the 19th Working Conference on Reverse Engineering (WCRE’12). 103–112.
[27]
Emily Hill, David Binkley, Dawn Lawrie, Lori Pollock, and K. Vijay-Shanker. 2014. An empirical study of identifier splitting techniques. Empirical Software Engineering 19, 6 (Dec. 2014), 1754–1780.
[28]
Johannes C. Hofmeister, Janet Siegmund, and Daniel V. Holt. 2018. Shorter identifier names take longer to comprehend. Empirical Software Engineering 24, 6 (2018), 1–27.
[29]
Yanjie Jiang, Hui Liu, Jiahao Jin, and Lu Zhang. 2022. Automated expansion of abbreviations based on semantic relation and transfer expansion. IEEE Transactions on Software Engineering 48, 2 (2022), 519–537.
[30]
Yanjie Jiang, Hui Liu, and Lu Zhang. 2019. Semantic relation based expansion of abbreviations. In Proceedings of the Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’19). 131–141.
[31]
Suntae Kim and Dongsun Kim. 2016. Automatic identifier inconsistency detection using code dictionary. Empirical Software Engineering 21, 2 (April 2016), 565–604.
[32]
Dawn Lawrie and David Binkley. 2011. Expanding identifiers to normalize source code vocabulary. In Proceedings of the International Conference on Software Maintenance (ICSM’11). 113–122.
[33]
Dawn Lawrie, Henry Feild, and David Binkley. 2006. Syntactic identifier conciseness and consistency. In Proceedings of the IEEE International Workshop on Source Code Analysis and Manipulation (SCAM’06). 139–148.
[34]
Guangjie Li, Hui Liu, and Ally S. Nyamawe. 2020. A survey on renamings of software entities. ACM Computing Survey 53, 2 (2020), Article 41, 38 pages.
[35]
Jiechu Li, Qingfeng Du, Kun Shi, Yu He, and Jincheng Xu. 2018. Helpful or not? An investigation on the feasibility of identifier splitting via CNN-BiLSTM-CRF. In Proceedings of the 30th International Conference on Software Engineering and Knowledge Engineering (SEKE’18). 175–214.
[36]
Jiahui Liang, Weiqin Zou, Jingxuan Zhang, Zhiqiu Huang, and Chenxing Sun. 21. A deep method renaming prediction and refinement approach for Java projects. In Proceedings of the International Conference on Software Quality, Reliability, and Security (QRS’21). 404–413.
[37]
Bin Lin, Csaba Nagy, Gabriele Bavota, Andrian Marcus, and Michele Lanza. 2019. On the quality of identifiers in test code. In Proceedings of the 19th International Working Conference on Source Code Analysis and Manipulation (SCAM’19). 204–215.
[38]
Bin Lin, Simone Scalabrino, Andrea Mocci, Rocco Oliveto, and Michele Lanza. 2017. Investigating the use of code analysis and NLP to promote a consistent usage of identifiers. In Proceedings of the International Working Conference on Source Code Analysis and Manipulation (SCAM’17). 81–90.
[39]
Hui Liu, Qiurong Liu, Yang Liu, and Zhouding Wang. 2015. Identifying renaming opportunities by expanding conducted rename refactorings. IEEE Transactions on Software Engineering 41, 9 (2015), 887–900.
[40]
Hui Liu, Qiurong Liu, Cristian Alexandru Staicu, Michael Pradel, and Luo Yue. 2016. Nomen est omen: Exploring and exploiting similarities between argument and parameter names. In Proceedings of the International Conference on Software Engineering (ICSE’16). 1063–1073.
[41]
Kui Liu, Dongsun Kim, Tegawendé F. Bissyandé, Taeyoung Kim, Kisub Kim, Anil Koyuncu, Suntae Kim, and Yves Le Traon. 2019. Learning to spot and refactor inconsistent method names. In Proceedings of the 41st International Conference on Software Engineering (ICSE’19). 1–12.
[42]
Siyuan Liu, Jingxuan Zhang, Jiahui Liang, Junpeng Luo, Yong Xu, and Chenxing Sun. 2021. CHIS: A novel hybrid granularity identifier splitting approach. In Proceedings of the 28th Asia-Pacific Software Engineering Conference (APSEC’21). 192–201.
[43]
Philip Mayer and Andreas Schroeder. 2014. Automated multi-language artifact binding and rename refactoring between Java and DSLs used by Java frameworks. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP’14). 437–462.
[44]
Christian D. Newman, Michael J. Decker, Reem Alsuhaibani, Anthony Peruma, Mohamed Mkaouer, Satyajit Mohapatra, Tejal Vishoi, Marcos Zampieri, Timothy Sheldon, and Emily Hill. 2022. An ensemble approach for annotating source code identifiers with part-of-speech tags. IEEE Transactions on Software Engineering 48, 9 (2022), 3506–3522.
[45]
Son Nguyen, Hung Phan, Trinh Le, and Tien N. Nguyen. 2020. Suggesting natural method names to check name consistencies. In Proceedings of the 42nd International Conference on Software Engineering (ICSE’20). 1372–1384.
[46]
Jeffrey L. Overbey, Ralph E. Johnson, and Munawar Hafiz. 2016. Differential precondition checking: A language-independent, reusable analysis for refactoring engines. Automated Software Engineering 23, 1 (2016), 77–104.
[47]
Jevgenija Pantiuchina, Fiorella Zampetti, Simone Scalabrino, Valentina Piantadosi, Rocco Oliveto, Gabriele Bavota, and Massimiliano Di Penta. 2020. Why developers refactor source code: A mining-based study. ACM Transactions Software Engineering and Methodology 29, 4 (2020), Article 29, 30 pages.
[48]
Anthony Peruma, Mohamed Wiem Mkaouer, Michael J. Decker, and Christian D. Newman. 2018. An empirical investigation of how and why developers rename identifiers. In Proceedings of the 2nd International Workshop on Refactoring (IWoR’18). ACM, New York, NY, 26–33.
[49]
Anthony Peruma, Mohamed Wiem Mkaouer, Michael J. Decker, and Christian D. Newman. 2020. Contextualizing rename decisions using refactorings, commit messages, and data types. Journal of Systems and Software 169 (2020), 110704.
[50]
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from “Big Code.” In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium. 111–124.
[51]
Chanchal K. Roy and James R. Cordy. 2008. NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In Proceedings of the 16th International Conference on Program Comprehension (ICPC’08). 172–181.
[52]
Walter Savitch. 2004. Java: An Introduction to Problem Solving and Programming (4th ed.). Prentice Hall.
[53]
Simone Scalabrino, Gabriele Bavota, Christopher Vendome, Mario Linares-Vasquez, Denys Poshyvanyk, and Rocco Oliveto. 2017. Automatically assessing code understandability: How far are we? In Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). 417–427.
[54]
Giuseppe Scanniello, Michele Risi, Porfirio Tramontana, and Simone Romano. 2017. Fixing faults in C and Java source code: Abbreviated vs. full-word identifier names. ACM Transactions on Software Engineering and Methodology 26, 2 (2017), Article 6, 43 pages.
[55]
Hagen Schink, Martin Kuhlemann, Gunter Saake, and Ralf Lämmel. 2011. Hurdles in multi-language refactoring of hibernate applications. In Proceedings of the 6th International Conference on Software and Database Technologies. 129–134.
[56]
Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente. 2016. Why we refactor? Confessions of GitHub contributors. In Proceedings of the ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’16). 858–870.
[57]
Giriprasad Sridhara, Lori Pollock, and K. Vijay-Shanker. 2011. Automatically detecting and describing high level actions within methods. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11). 101–110.
[58]
Takayuki Suzuki, Kazunori Sakamoto, Fuyuki Ishikawa, and Shinichi Honiden. 2014. An approach for evaluating and suggesting method names using n-gram models. In Proceedings of the 22nd International Conference on Program Comprehension (ICPC’14). 271–274.
[59]
Zhaopeng Tu, Zhendong Su, and Premkumar Devanbu. 2014. On the localness of software. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14). 269–280.
[60]
Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E. Hassan, and Shanping Li. 2018. Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering 44, 10 (Oct. 2018), 951–976.
[61]
Meng Yan, Xin Xia, Emad Shihab, David Lo, Jianwei Yin, and Xiaohu Yang. 2019. Automating change-level self-admitted technical debt determination. IEEE Transactions on Software Engineering 45, 12 (2019), 1211–1229.
[62]
Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, and Graham Neubig. 2018. Learning to mine aligned code and natural language pairs from Stack Overflow. In Proceedings of the International Conference on Mining Software Repositories (MSR’18). 476–486.
[63]
Norihiro Yoshida, Takeshi Hattori, and Katsuro Inoue. 2010. Finding similar defects using synonymous identifier retrieval. In Proceedings of the 4th International Workshop on Software Clones (IWSC’10). 49–56.
[64]
Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12). 14–24.

Cited By

View all
  • (2024)StructuredFuzzer: Fuzzing Structured Text-Based Control Logic ApplicationsElectronics10.3390/electronics1313247513:13(2475)Online publication date: 25-Jun-2024
  • (2024)Advanced White-Box Heuristics for Search-Based Fuzzing of REST APIsACM Transactions on Software Engineering and Methodology10.1145/365215733:6(1-36)Online publication date: 27-Jun-2024
  • (2024)RAPID: Zero-Shot Domain Adaptation for Code Search with Pre-Trained ModelsACM Transactions on Software Engineering and Methodology10.1145/364154233:5(1-35)Online publication date: 3-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 6
November 2023
949 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3625557
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 September 2023
Online AM: 29 May 2023
Accepted: 27 April 2023
Revised: 12 February 2023
Received: 01 April 2022
Published in TOSEM Volume 32, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Identifier renaming
  2. source code analysis
  3. code refactoring
  4. mining code repository

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Fund of Prospective Layout of Scientific Research for NUAA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)328
  • Downloads (Last 6 weeks)16
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)StructuredFuzzer: Fuzzing Structured Text-Based Control Logic ApplicationsElectronics10.3390/electronics1313247513:13(2475)Online publication date: 25-Jun-2024
  • (2024)Advanced White-Box Heuristics for Search-Based Fuzzing of REST APIsACM Transactions on Software Engineering and Methodology10.1145/365215733:6(1-36)Online publication date: 27-Jun-2024
  • (2024)RAPID: Zero-Shot Domain Adaptation for Code Search with Pre-Trained ModelsACM Transactions on Software Engineering and Methodology10.1145/364154233:5(1-35)Online publication date: 3-Jun-2024
  • (2024)Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug ReportsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639163(1-13)Online publication date: 20-May-2024

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media