Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mining software repositories for adaptive change commits using machine learning techniques

Published: 01 May 2019 Publication History

Abstract

Context

Version Control Systems, such as Subversion, are standard repositories that preserve all of the maintenance changes undertaken to source code artifacts during the evolution of a software system. The documented data of the version history are organized as commits; however, these commits do not keep a tag that would identify the purpose of the relevant undertaken change of a commit, thus, there is rarely enough detail to clearly direct developers to the changes associated with a specific type of maintenance.

Objective

This work examines the version histories of an open source system to automatically classify version commits into one of two categories, namely adaptive commits and non-adaptive commits.

Method

We collected the commits from the version history of three open source systems, then we obtained eight different code change metrics related to, for example, the number of changed statements, methods, hunks, and files. Based on these change metrics, we built a machine learning approach to classify whether a commit was adaptive or not.

Results

It is observed that code change metrics can be indicative of adaptive maintenance activities. Also, the classification findings show that the machine learning classifier developed has approximately 75% prediction accuracy within labeled change histories.

Conclusion

The proposed method automates the process of examining the version history of a software system and identifies which commits to the system are related to an adaptive maintenance task. The evaluation of the method supports its applicability and efficiency. Although the evaluation of the proposed classifier on unlabeled change histories shows that it is not much better than the random guessing in terms of F-measure, we feel that our classifier would serve as a better basis for developing advanced classifiers that have predictive power of adaptive commits without the need of manual efforts.

References

[1]
O. Meqdadi, N. Alhindawi, J.I. Maletic, M.L. Collard, Understanding large-scale adaptive changes from version histories: a case study, in: Proceedings of the 29th IEEE International Conference on Software Maintenance, (ICSM ’13) ERA Track, 2013, pp. 416–419.
[2]
I. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second ed., Morgan Kaufmann, 2005.
[3]
A. Hindle, D.M. German, M.W. Godfrey, R.C. Holt, Automatic classification of large changes into maintenance categories, in: Proceedings of the IEEE 17th International Conference on Program Comprehension, ICPC '09, 2009, pp. 30–39.
[4]
D. Cubranic, G.C. Murphy, Automatic bug triage using text categorization, in: Proceedings of the 16th International Conference on Software Engineering & Knowledge Engineering(SEKE '04), 2004, pp. 92–97.
[5]
N. Bettenburg, R. Premraj, T. Zimmermann, S. Kim, Duplicate bug reports considered harmful. Really?, in: Proceedings of the International Conference on Software Maintenance(ICSM '08), 2008, pp. 337–345.
[6]
T. Mitchell, Machine Learning, McGraw Hill, 1997, Online].Available http://www.cs.cmu.edu/∼tom/mlbook-chapter-slides.html.
[7]
A. Mccallum, K. Nigam, A comparison of event models for naive bayes text classification, in: the Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98)- Workshop on Learning for Text Categorization, 1998, pp. 41–48.
[8]
J. Han, M. Kamber, Data Mining: Concepts and Techniques, second ed., Morgan Kaufmann, 2006.
[9]
D. Aha, D. Kibler, M. Albert, Instance-based learning algorithms, J. Mach. Learn. 6 (1) (1991) 37–66.
[10]
R. Spiewak, K. McRitchie, Using software quality methods to reduce cost and prevent defects, J. Def. Softw. Eng. 21 (12) (2008) 23–27.
[11]
P. Bhattacharya, I. Neamtiu, C. Shelton, Automated, highly-accurate, bug assignment using machine learning and tossing graphs, J. Syst. Softw. 85 (10) (2012) 2275–2292.
[12]
S. Aleem, L. Capretz, F. Ahmed, Comparative performance analysis of machine learning techniques for software bug detection, in: Proceedings of the 4th International Conference on Software Engineering and Applications (JSE '15), 2015, pp. 71–79.
[13]
A. Mockus, L. Votta, Identifying reasons for software changes using historic databases, in: Proceedings of the 16th IEEE International Conference on Software Maintenance (ICSM '00), 2000, pp. 120–130.
[14]
G. Canfora, L. Cerulo, Impact analysis by mining software and change request repositories, in: Proceedings of the 11th IEEE International Software Metrics Symposium (METRICS '05), 2005, pp. 29–38.
[15]
L. Hattori, M. Lanza, On the nature of commits, in: Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering - Workshops, 2008, pp. 63–71.
[16]
S. Lessmann, B. Baesens, C. Mues, S. Pietsch, Benchmarking classification models for software defect prediction: a proposed framework and novel finding, IEEE Trans. Softw. Eng. 34 (4) (2008) 485–496.
[17]
T. Sharma, M. Jain, WEKA approach for comparative study of classification algorithm, Int. J. Adv. Res. Comput. Commun. Eng. 2 (4) (2013).
[18]
T. Wang, W. Li, H. Shi, Z. Liu, Software defect prediction based on classifiers ensemble, J. Inf. Comput. Sci. 8 (1) (2011) 4241–4254.
[19]
J. Hayes, S. Patel, L. Zhao, A metrics based software maintenance effort model, in: Proceedings of 8th European Conference on Software Maintenance and Reengineering (CSMR ‘04), 2004, pp. 254–258.
[20]
A. Alali, H. Kagdi, J.I. Maletic, What's a typical commit? A characterization of open source software repositories, in: Proceedings of 16th International Conference on Program Comprehension (ICPC '8), 2008, pp. 182–191.
[21]
D. Hinkle, W. Wiersma, S. Jurs, Applied Statistics for the Behavioral Sciences, fourth ed., Houghton Mifflin, 1998.
[22]
L. Yu, Indirectly predicting the maintenance effort of open-source software, J. Softw. Mainten. Evol. 18 (Issue 5) (2006) 311–332.
[23]
J. Ramil, M. Lehman, Effort estimation from change records of evolving software, in: Proceedings of International Conference on Software Engineering (ICSE '00), 2000, pp. 777–787.
[24]
K. Pan, S. Kim, E. Whitehead, Bug classification using program slicing metrics, in: Proceedings of IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM ‘06), 2006, pp. 31–42.
[25]
R. Johnson, D. Wichern, Applied Multivariate Statistical Analysis, fourth ed., Prentice Hall, 1998.
[26]
S. Schach, B. Jin, L. Yu, G. Heller, J. Offutt, Determining the distribution of maintenance categories: survey versus measurement, Empir. Softw. Eng. 8 (4) (2003) 351–365.
[27]
E. Swanson, The dimensions of maintenance, in: Proceedings of the 2nd international conference on Software engineering (ICSE '76), 1976, pp. 492–497.
[28]
M.L Collard, J.I. Maletic, B.P. Robinson, A lightweight transformational approach to support large scale adaptive changes, in: Proceedings of the IEEE 26th IEEE International Conference on Software Maintenance (ICSM'10), 2010, p. 10 pages.
[29]
N. Meng, M. Kim, K. McKinley, LASE: locating and applying systematic edits by learning from examples, in: Proceedings of the 35th International Conference on Software Engineering (ICSE '13), 2013, pp. 502–511.
[30]
F. Fioravanti, P. Nesi, Estimation and prediction metrics for adaptive maintenance effort of object-oriented systems, IEEE Trans. Softw. Eng. 27 (12) (2001) 1062–1084.
[31]
N. Meng, M. Kim, K. McKinley, Systematic editing: generating program transformations from an example, in: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011, pp. 329–342.
[32]
B. Efron, Estimating the error rate of a prediction rule: improvement on cross-validation, Tech. J. Am. Stat. Assoc. 78 (382) (1983) 316–331.
[33]
D. Kozlov, J. Koskinen, J. Markkula, M. Sakkinen, Evaluating the impact of adaptive maintenance process on open source software quality, in: the Proceedings of the 1st International Symposium on Empirical Software Engineering and Measurement, 2007, pp. 186–195.
[34]
A. Mockus, D. Weiss, Predicting risk of software changes, Bell Labs Techn. J. 5 (2) (2000) 169–180.
[35]
N. Alhindawi, J.I. Maletic, N. Dragan, M.L. Collard, Improving feature location by enhancing source code with stereotypes, in: the Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM’13), 2013, p. 10 pages.
[36]
S. Kim, E. Whitehead, Y. Zhang, Classifying software changes: clean or buggy?, IEEE Trans. Softw. Eng. 34 (2) (2008) 181–196.
[37]
H. Liu, R. Setiono, Chi2: feature selection and discretization of numeric attributes, in: Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence, 1995, pp. 388–391.
[38]
M. Hammad, M.L. Collard, J.I. Maletic, Automatically identifying changes that impact code-to-design traceability, in: Proceedings of the IEEE 17th International Conference on Program Comprehension, (ICPC ‘9), 2009, pp. 20–29.
[39]
J. Nam, S. Kim, CLAMI: defect prediction on unlabeled datasets, in: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE ‘15), 2015, pp. 452–463.
[40]
J. Nam, S.J. Pan, S. Kim, Transfer defect learning, in: Proceedings of the 35th International Conference on Software Engineering (ICSE ‘13), 2013, pp. 382–391.
[41]
H. Garcia, E. Shihab, Characterizing and predicting blocking bugs in open source projects, in: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR ‘14), 2014, pp. 72–81.

Cited By

View all

Index Terms

  1. Mining software repositories for adaptive change commits using machine learning techniques
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Information and Software Technology
    Information and Software Technology  Volume 109, Issue C
    May 2019
    126 pages

    Publisher

    Butterworth-Heinemann

    United States

    Publication History

    Published: 01 May 2019

    Author Tags

    1. Code change metrics
    2. Adaptive maintenance
    3. Commit types
    4. Maintenance classification
    5. Machine learning

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
    • (2023)BTLink : automatic link recovery between issues and commits based on pre-trained BERT modelEmpirical Software Engineering10.1007/s10664-023-10342-728:4Online publication date: 12-Jul-2023
    • (2022)A systematic process for Mining Software RepositoriesInformation and Software Technology10.1016/j.infsof.2021.106791144:COnline publication date: 9-May-2022
    • (2019)Bug types fixed by API-migrationProceedings of the Second International Conference on Data Science, E-Learning and Information Systems10.1145/3368691.3368693(1-7)Online publication date: 2-Dec-2019

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media