Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3422392.3422427acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbesConference Proceedingsconference-collections
research-article

Applying Machine Learning to Customized Smell Detection: A Multi-Project Study

Published: 21 December 2020 Publication History

Abstract

Code smells are considered symptoms of poor implementation choices, which may hamper the software maintainability. Hence, code smells should be detected as early as possible to avoid software quality degradation. Unfortunately, detecting code smells is not a trivial task. Some preliminary studies investigated and concluded that machine learning (ML) techniques are a promising way to better support smell detection. However, these techniques are hard to be customized to promote an early and accurate detection of specific smell types. Yet, ML techniques usually require numerous code examples to be trained (composing a relevant dataset) in order to achieve satisfactory accuracy. Unfortunately, such a dependency on a large validated dataset is impractical and leads to late detection of code smells. Thus, a prevailing challenge is the early customized detection of code smells taking into account the typical limited training data. In this direction, this paper reports a study in which we collected code smells, from ten active projects, that were actually refactored by developers, differently from studies that rely on code smells inferred by researchers. These smells were used for evaluating the accuracy regarding early detection of code smells by using seven ML techniques. Once we take into account such smells that were considered as important by developers, the ML techniques are able to customize the detection in order to focus on smells observed as relevant in the investigated systems. The results showed that all the analyzed techniques are sensitive to the type of smell and obtained good results for the majority of them, especially JRip and Random Forest. We also observe that the ML techniques did not need a high number of examples to reach their best accuracy results. This finding implies that ML techniques can be successfully used for early detection of smells without depending on the curation of a large dataset.

References

[1]
Marwen Abbes, Foutse Khomh, Yann-Gael Gueheneuc, and Giuliano Antoniol. 2011. An empirical study of the impact of two antipatterns, blob and spaghetti code, on programcomprehension. In 15th European Conference on Software Maintenance and Reengineering (CSMR). IEEE, 181--190.
[2]
Lucas Amorim, Evandro Costa, Nuno Antunes, Baldoino Fonseca, and Marcio Ribeiro. 2015. Experience Report: Evaluating the Effectiveness of Decision Trees for Detecting Code Smells. In Proceedings of the 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE '15). IEEE Computer Society, Washington, DC, USA, 261--269. https://doi.org/10.1109/ISSRE.2015.7381819
[3]
Roberta Arcoverde, Isela Macia, Alessandro Garcia, and Arndt Von Staa. 2012. Automatically detecting architecturally-relevant code anomalies. In 2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE). IEEE, 90--91.
[4]
Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi, and Qing Wang. 2019. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology 108 (2019), 115--138. https://doi.org/10.1016/j.infsof.2018.12.009
[5]
Gabriele Bavota, Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, and Fabio Palomba. 2015. An experimental investigation on the innate relationship between quality and refactoring. Journal of Systems and Software (JSS) 107 (2015), 1--14.
[6]
Richard Bellman. 1966. Dynamic programming. Science 153, 3731 (1966), 34--37.
[7]
Ana Carla Bibiano, Eduardo Fernandes, Daniel Oliveira, Alessandro Garcia, Marcos Kalinowski, Baldoino Fonseca, Roberto Oliveira, Anderson Oliveira, and Diego Cedrim. 2019. A Quantitative Study on Characteristics and Effect of Batch Refactoring on Code Smells. In 13th International Symposium on Empirical Software Engineering and Measurement (ESEM). 1--11.
[8]
Diego Cedrim, Alessandro Garcia, Melina Mongiovi, Rohit Gheyi, Leonardo Sousa, Rafael de Mello, Baldoino Fonseca, Márcio Ribeiro, and Alexander Chávez. 2017. Understanding the impact of refactoring on smells. In ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(ESEC/FSE). 465--475.
[9]
Alexander Chávez, Isabella Ferreira, Eduardo Fernandes, Diego Cedrim, and Alessandro Garcia. 2017. How does refactoring affect internal quality attributes? A multi-project study. In Proceedings of the 31st Brazilian Symposium on Software Engineering(SBES). 74--83.
[10]
William W. Cohen. 1995. Fast Effective Rule Induction. In Twelfth International Conference on Machine Learning. Morgan Kaufmann, 115--123.
[11]
Warteruzannan Soyer Cunha and Valter Vieira de Camargo. 2019. Uma Investigação da Aplicação de Aprendizado de Máquina para Detecção de Smells Arquiteturais. In Anais do VII Workshop on Software Visualization, Evolution and Maintenance (VEM) (Salvador). SBC, Porto Alegre, RS, Brasil, 78--85. https://doi.org/10.5753/vem.2019.7587
[12]
R. M. d. Mello, R. F. Oliveira, and A. F. Garcia. 2017. On the Influence of Human Factors for Identifying Code Smells: A Multi-Trial Empirical Study. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 68--77. https://doi.org/10.1109/ESEM.2017.13
[13]
Rafael de Mello, Anderson Uchôa, Roberto Oliveira, Willian Oizumi, Jairo Souza, Kleyson Mendes, Daniel Oliveira, Baldoino Fonseca, and Alessandro Garcia. 2019. Do Research and Practice of Code Smell Identification Walk Together? A Social Representations Analysis. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1--6.
[14]
D. Di Nucci, F. Palomba, D. A. Tamburri, A. Serebrenik, and A. De Lucia. 2018. Detecting code smells using machine learning techniques: Are we there yet?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). 612--621.
[15]
Eduardo Fernandes, Johnatan Oliveira, Gustavo Vale, Thanis Paiva, and Eduardo Figueiredo. 2016. A review-based comparative study of bad smell detection tools. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE). 18:1--18:12.
[16]
Francesca Arcelli Fontana, Pietro Braione, and Marco Zanoni. 2012. Automatic detection of bad smells in code: An experimental assessment. Journal of Object Technology 11, 2 (2012), 5--1.
[17]
Francesca Arcelli Fontana, Mika V. Mäntylä, Marco Zanoni, and Alessandro Marino. 2015. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering (June 2015). https://doi.org/10.1007/s10664-015-9378-4
[18]
Francesca Arcelli Fontana, Marco Zanoni, Alessandro Marino, and Mika V. Mäntylä. 2013. Code Smell Detection: Towards a Machine Learning-Based Approach. 2013 IEEE International Conference on Software Maintenance (sep 2013), 396--399. https://doi.org/10.1109/ICSM.2013.56
[19]
Martin Fowler. 1999. Refactoring (1 ed.). Addison-Wesley Professional.
[20]
Martin Fowler. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley, Boston, MA, USA.
[21]
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, and Ian H Reutemann, Peter and Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11, 1 (2009), 10--18.
[22]
Tin Kam Ho. 1995. Random decision forests. In Document analysis and recognition, 1995., proceedings of the third international conference on, Vol. 1. IEEE, 278--282.
[23]
R.C. Holte. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning 11 (1993), 63--91.
[24]
Mario Hozano, Nuno Antunes, Baldoino Fonseca, and Evandro Costa. 2017. Evaluating the Accuracy of Machine Learning Algorithms on Detecting Code Smells for Different Developers. In Proceedings of the 19th International Conference on Enterprise Information Systems. 474--482.
[25]
Mario Hozano, Alessandro Garcia, Nuno Antunes, Baldoino Fonseca, and Evandro Costa. 2017. Smells Are Sensitive to Developers!: On the Efficiency of (Un)Guided Customized Detection. In Proceedings of the 25th International Conference on Program Comprehension (Buenos Aires, Argentina) (ICPC '17). IEEE Press, Piscataway, NJ, USA, 110--120. https://doi.org/10.1109/ICPC.2017.32
[26]
Mário Hozano, Alessandro Garcia, Baldoino Fonseca, and Evandro Costa. 2018. Are You Smelling It? Investigating How Similar Developers Detect Code Smells. Information and Software Technology (IST) 93, C (Jan. 2018), 130--146. https://doi.org/10.1016/j.infsof.2017.09.002
[27]
Allen Kent, Madeline M Berry, Fred U Luehrs Jr, and James W Perry. 1955. Machine literature searching VIII. Operational criteria for designing information retrieval systems. American documentation 6, 2 (1955), 93--101.
[28]
Foutse Khomh, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. 2011. An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empirical Software Engineering 17, 3 (Aug. 2011), 243--275. https://doi.org/10.1007/s10664-011-9171-y
[29]
F Khomh, S Vaucher, Y G Guéhéneuc, and H Sahraoui. 2009. A bayesian approach for the detection of code and design smells. In Quality Software, 2009. QSIC'09. 9th International Conference on. IEEE, 305--314.
[30]
Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan. 2014. An empirical study of refactoring challenges and benefits at Microsoft. TSE'14 40, 7 (2014), 633--649.
[31]
Brett Lantz. 2019. Machine learning with R: expert techniques for predictive modeling. Packt Publishing Ltd.
[32]
Michele Lanza and Radu Marinescu. 2007. Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer Science & Business Media.
[33]
Michele Lanza, Radu Marinescu, and Stéphane Ducasse. 2005. Object-Oriented Metrics in Practice. Springer-Verlag New York, Inc., Secaucus, NJ, USA.
[34]
Isela Macia, Alessandro Garcia, Christina Chavez, and Arndt von Staa. 2013. Enhancing the detection of code anomalies with architecture-sensitive strategies. In Software Maintenance and Reengineering (CSMR), 2013 17th European Conference on. IEEE, 177--186.
[35]
Abdou Maiga, Nasir Ali, Neelesh Bhattacharya, Aminata Sabane, and Esma Gueheneuc, Yann-Gael andAimeur. 2012. SMURF: A SVM-based Incremental Anti-pattern Detection Approach. 2012 19th Working Conference on Reverse Engineering (Oct. 2012), 466--475. https://doi.org/10.1109/WCRE.2012.56
[36]
Tom M. Mitchell. 1997. Machine learning. McGraw-Hill, Boston (Mass.), Burr Ridge (Ill.), Dubuque (Iowa). http://opac.inria.fr/record=b1093076
[37]
M.J. Munro. 2005. Product Metrics for Automatic Identification of "Bad Smell" Design Problems in Java Source-Code. 11th IEEE International Software Metrics Symposium (METRICS) (2005), 15--15. https://doi.org/10.1109/METRICS.2005.38
[38]
Daniel Oliveira. 2020. Towards customizing smell detection and refactorings. (2020). Master dissertation. Pontifical University of Rio de Janeiro.
[39]
Fabio Palomba, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and AndreaDe Lucia. 2014. Do They Really Smell Bad? A Study on Developers' Perception of Bad Code Smells. IEEE International Conference on Software Maintenance and Evolution (2014), 101--110. https://doi.org/10.1109/ICSME.2014.32
[40]
Fabiano Pecorelli, Dario Di Nucci, Coen De Roover, and Andrea De Lucia. 2019. On the Role of Data Balancing for Machine Learning-Based Code Smell Detection. In Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation (Tallinn, Estonia) (MaLTeSQuE 2019). Association for Computing Machinery, New York, NY, USA, 19--24. https://doi.org/10.1145/3340482.3342744
[41]
Fabiano Pecorelli, Dario [Di Nucci], Coen [De Roover], and Andrea [De Lucia]. 2020. A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software 169 (2020), 110693. https://doi.org/10.1016/j.jss.2020.110693
[42]
Fabiano Pecorelli, Fabio Palomba, Foutse Khomh, and Andrea De Lucia. 2020. Developer-Driven Code Smell Prioritization. In International Conference on Mining Software Repositories.
[43]
J. Platt. 1998. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In Advances in Kernel Methods - Support Vector Learning, B. Schoelkopf, C. Burges, and A. Smola (Eds.). MIT Press. http://research.microsoft.com/~jplatt/smo.html
[44]
Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.
[45]
José Amancio M. Santos, Manoel G. Mendonça, Cleber Pereira dos Santos, and Renato Lima Novais. 2014. The problem of conceptualization in god class detection: agreement, strategies and decision drivers. Journal of Software Engineering Research and Development 2 (2014), 1--33.
[46]
Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente. 2016. Why we refactor?. In FSE'16. 858--870.
[47]
Ingo Steinwart and Andreas Christmann. 2008. Support vector machines. Springer Science & Business Media.
[48]
Nikolaos Tsantalis, Victor Guana, Eleni Stroulia, and Abram Hindle. 2013. A multidimensional empirical study on refactoring activity. In 23rd Annual International Conference on Computer Science and Software Engineering. 132--146.
[49]
Aiko Yamashita and Leon Moonen. 2012. Do code smells reflect important maintainability aspects?. In 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, 306--315.
[50]
Aiko Yamashita and Leon Moonen. 2013. Exploring the Impact of Inter-smell Relations on Software Maintainability: An Empirical Study. In Proceedings of the 2013 International Conference on Software Engineering (San Francisco, CA, USA) (ICSE '13). IEEE Press, Piscataway, NJ, USA, 682--691. http://dl.acm.org/citation.cfm?id=2486788.2486878

Cited By

View all

Index Terms

  1. Applying Machine Learning to Customized Smell Detection: A Multi-Project Study

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SBES '20: Proceedings of the XXXIV Brazilian Symposium on Software Engineering
    October 2020
    901 pages
    ISBN:9781450387538
    DOI:10.1145/3422392
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • SBC: Brazilian Computer Society

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 December 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. code smell
    2. code smell detection
    3. software quality

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
    • Conselho Nacional de Desenvolvimento Científico e Tecnológico
    • Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro

    Conference

    SBES '20

    Acceptance Rates

    Overall Acceptance Rate 147 of 427 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Code smell detection based on supervised learning modelsNeurocomputing10.1016/j.neucom.2023.127014565:COnline publication date: 27-Feb-2024
    • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
    • (2024)Improving accuracy of code smells detection using machine learning with data balancing techniquesThe Journal of Supercomputing10.1007/s11227-024-06265-980:14(21048-21093)Online publication date: 5-Jun-2024
    • (2023)Exploring the Intersection between Software Maintenance and Machine Learning—A Systematic Mapping StudyApplied Sciences10.3390/app1303171013:3(1710)Online publication date: 29-Jan-2023
    • (2022)Developers’ perception matters: machine learning to detect developer-sensitive smellsEmpirical Software Engineering10.1007/s10664-022-10234-227:7Online publication date: 1-Dec-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media