Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3453483.3454045acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections

Learning to find naming issues with big code and small supervision

Published: 18 June 2021 Publication History


We introduce a new approach for finding and fixing naming issues in source code. The method is based on a careful combination of unsupervised and supervised procedures: (i) unsupervised mining of patterns from Big Code that express common naming idioms. Program fragments violating such idioms indicates likely naming issues, and (ii) supervised learning of a classifier on a small labeled dataset which filters potential false positives from the violations.
We implemented our method in a system called Namer and evaluated it on a large number of Python and Java programs. We demonstrate that Namer is effective in finding naming mistakes in real world repositories with high precision (~70%). Perhaps surprisingly, we also show that existing deep learning methods are not practically effective and achieve low precision in finding naming issues (up to ~16%).


2020. American fuzzy lop. https://lcamtuf.coredump.cx/afl/
2020. Error (Java SE 14 & JDK 14). https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/lang/Error.html
2020. GitHub. https://github.com
2020. ICLR20-Great. https://github.com/VHellendoorn/ICLR20-Great
2020. tf-gnn-samples. https://github.com/microsoft/tf-gnn-samples
2020. unittest — Unit testing framework. https://docs.python.org/3/library/unittest.html##unittest.TestCase.assertTrue
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In FSE 2015. https://doi.org/10.1145/2786805.2786849
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. 2014. Learning Natural Coding Conventions. In FSE 2014. https://doi.org/10.1145/2635868.2635883
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In ICLR 2018. https://openreview.net/forum?id=BJOFETxR-
Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In ICML 2016. http://proceedings.mlr.press/v48/allamanis16.html
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating Sequences from Structured Representations of Code. In ICLR 2019. https://openreview.net/forum?id=H1gKYo09tX
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A General Path-Based Representation for Predicting Program Properties. In PLDI 2018. https://doi.org/10.1145/3192366.3192412
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang., 3, POPL (2019), 40:1–40:29. https://doi.org/10.1145/3290353
Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2018. Active Learning of Points-to Specifications. In PLDI 2018. https://doi.org/10.1145/3192366.3192383
Benjamin Bichsel, Veselin Raychev, Petar Tsankov, and Martin Vechev. 2016. Statistical Deobfuscation of Android Applications. In CCS 2016. https://doi.org/10.1145/2976749.2978422
Pavol Bielik and Martin Vechev. 2020. Adversarial Robustness for Code. In ICML. http://proceedings.mlr.press/v119/bielik20a.html
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coverage-based Greybox Fuzzing as Markov Chain. In CCS 2016. https://doi.org/10.1145/2976749.2978428
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2010. Exploring the Influence of Identifier Names on Code Quality: An Empirical Study. In CSMR 2010. https://doi.org/10.1109/CSMR.2010.27
Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In OSDI 2008. http://www.usenix.org/events/osdi08/tech/full_papers/cadar/cadar.pdf
Victor Chibotaru, Benjamin Bichsel, Veselin Raychev, and Martin Vechev. 2019. Scalable Taint Specification Inference with Big Code. In PLDI 2019. https://doi.org/10.1145/3314221.3314648
Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In POPL 1977. https://doi.org/10.1145/512950.512973
Jan Eberhardt, Samuel Steffen, Veselin Raychev, and Martin Vechev. 2019. Unsupervised Learning of API Aliasing Specifications. In PLDI 2019. https://doi.org/10.1145/3314221.3314640
Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&Fuzz: Machine Learning for Input Fuzzing. In ASE 2017. https://doi.org/10.1109/ASE.2017.8115618
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining Frequent Patterns without Candidate Generation. In SIGMOD 2000. https://doi.org/10.1145/342009.335372
Quinn Hanam, Lin Tan, Reid Holmes, and Patrick Lam. 2014. Finding patterns in static analysis alerts: improving actionable alert ranking. In MSR 2014. https://doi.org/10.1145/2597073.2597100
Jingxuan He, Mislav Balunovic, Nodar Ambroladze, Petar Tsankov, and Martin Vechev. 2019. Learning to Fuzz from Symbolic Execution with Application to Smart Contracts. In CCS 2019. https://doi.org/10.1145/3319535.3363230
Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, and Martin Vechev. 2018. Debin: Predicting Debug Information in Stripped Binaries. In CCS 2018. https://doi.org/10.1145/3243734.3243866
Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2020. Global Relational Models of Source Code. In ICLR 2020. OpenReview.net. https://openreview.net/forum?id=B1lnbRNtwr
Kihong Heo, Hakjoo Oh, and Hongseok Yang. 2019. Resource-aware Program Analysis via Online Abstraction Coarsening. In ICSE 2019. https://doi.org/10.1109/ICSE.2019.00027
Einar W. Høst and Bjarte M. Ø stvold. 2009. Debugging Method Names. In ECOOP 2009. https://doi.org/10.1007/978-3-642-03013-0_14
Ted Kremenek and Dawson R. Engler. 2003. Z-Ranking: using statistical analysis to counter the impact of static analysis approximations. In SAS 2003. https://doi.org/10.1007/3-540-44898-5_16
Carson Kai-Sang Leung, Laks V. S. Lakshmanan, and Raymond T. Ng. 2002. Exploiting Succinct Constraints using FP-trees. SIGKDD Explorations, 4, 1 (2002), 40–49. https://doi.org/10.1145/568574.568581
Yi Li, Shaohua Wang, Tien N. Nguyen, and Son Van Nguyen. 2019. Improving Bug Detection via Context-Based Code Representation Learning and Attention-Based Neural Networks. Proc. ACM Program. Lang., 3, OOPSLA (2019), 162:1–162:30. https://doi.org/10.1145/3360588
Hui Liu, Qiurong Liu, Cristian-Alexandru Staicu, Michael Pradel, and Yue Luo. 2016. Nomen est omen: Exploring and Exploiting Similarities between Argument and Parameter Names. In ICSE 2016. https://doi.org/10.1145/2884781.2884841
Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: a Map of Code Duplicates on GitHub. Proc. ACM Program. Lang., 1, OOPSLA (2017), 84:1–84:28. https://doi.org/10.1145/3133908
Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. 2015. Learning a Strategy for Adapting a Program Analysis via Bayesian Optimisation. In OOPSLA 2015. https://doi.org/10.1145/2814270.2814309
Rumen Paletov, Petar Tsankov, Veselin Raychev, and Martin Vechev. 2018. Inferring crypto API rules from code changes. In PLDI 2018. https://doi.org/10.1145/3192366.3192403
Michael Pradel and Thomas R. Gross. 2011. Detecting Anomalies in the Order of Equally-typed Method Arguments. In ISSTA 2011. https://doi.org/10.1145/2001420.2001448
Michael Pradel and Koushik Sen. 2018. DeepBugs: a Learning Approach to Name-based Bug Detection. Proc. ACM Program. Lang., 2, OOPSLA (2018), 147:1–147:25. https://doi.org/10.1145/3276517
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In POPL 2015. https://doi.org/10.1145/2676726.2677009
Andrew Rice, Edward Aftandilian, Ciera Jaspan, Emily Johnston, Michael Pradel, and Yulissa Arroyo-Paredes. 2017. Detecting Argument Selection Defects. Proc. ACM Program. Lang., 1, OOPSLA (2017), 104:1–104:22. https://doi.org/10.1145/3133928
Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, and Suman Jana. 2019. NEUZZ: Efficient Fuzzing with Neural Program Smoothing. In S&P 2019. https://doi.org/10.1109/SP.2019.00052
Gagandeep Singh, Markus Püschel, and Martin Vechev. 2018. Fast Numerical Program Analysis with Reinforcement Learning. In CAV 2018. https://doi.org/10.1007/978-3-319-96145-3_12
Yannis Smaragdakis and George Balatsouras. 2015. Pointer Analysis. Found. Trends Program. Lang., 2, 1 (2015), 1–69. https://doi.org/10.1561/2500000014
Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, and Rishabh Singh. 2019. Neural Program Repair by Jointly Learning to Localize and Repair. In ICLR 2019. https://openreview.net/forum?id=ByloJ20qtm
Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-Driven Seed Generation for Fuzzing. In S&P 2017. https://doi.org/10.1109/SP.2017.23
Yu Wang, Ke Wang, Fengjuan Gao, and Linzhang Wang. 2020. Learning semantic program embeddings with graph interval neural network. Proc. ACM Program. Lang., 4, OOPSLA (2020), 137:1–137:27. https://doi.org/10.1145/3428205

Cited By

View all
  • (2025)Detecting and Explaining Python Name ErrorsInformation and Software Technology10.1016/j.infsof.2024.107592178(107592)Online publication date: Feb-2025
  • (2024)DAInfer: Inferring API Aliasing Specifications from Library Documentation via Neurosymbolic OptimizationProceedings of the ACM on Software Engineering10.1145/36608161:FSE(2469-2492)Online publication date: 12-Jul-2024
  • (2023)Pre-implementation Method Name Prediction for Object-oriented ProgrammingACM Transactions on Software Engineering and Methodology10.1145/359720332:6(1-35)Online publication date: 29-Sep-2023
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation
June 2021
1341 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021


Request permissions for this article.

Check for updates

Author Tags

  1. Anomaly detection
  2. Bug detection
  3. Machine learning
  4. Name-based program analysis
  5. Static analysis


  • Research-article


PLDI '21

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Feb 2025

Other Metrics


Cited By

View all
  • (2025)Detecting and Explaining Python Name ErrorsInformation and Software Technology10.1016/j.infsof.2024.107592178(107592)Online publication date: Feb-2025
  • (2024)DAInfer: Inferring API Aliasing Specifications from Library Documentation via Neurosymbolic OptimizationProceedings of the ACM on Software Engineering10.1145/36608161:FSE(2469-2492)Online publication date: 12-Jul-2024
  • (2023)Pre-implementation Method Name Prediction for Object-oriented ProgrammingACM Transactions on Software Engineering and Methodology10.1145/359720332:6(1-35)Online publication date: 29-Sep-2023
  • (2023)CombTransformers: Statement-Wise Transformers for Statement-Wise RepresentationsIEEE Transactions on Software Engineering10.1109/TSE.2023.331079349:10(4677-4690)Online publication date: 6-Sep-2023
  • (2022)Path-sensitive code embedding via contrastive learning for software vulnerability detectionProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3533767.3534371(519-531)Online publication date: 18-Jul-2022
  • (2022)NalinProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510144(1469-1481)Online publication date: 21-May-2022

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media