Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Inter project defect classification based on word embedding

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

Defect classification is a process to classify defects based on predefined categories. It is time consuming and manual process. Many automatic defect classification methods have been proposed to speed up the process of defect classification. However, these methods have not utilized the inter relations among the defect reports. In the literature for defect classification, Term Frequency-Inverse Document Frequency and Bag of words based approaches have been proposed. In this paper, we have proposed word embedding based model for the defect classification which is proven to be better in comparison with the existing methods. We have also proposed models for inter project defect classification by considering combination of different datasets of the same domain. We tested the proposed approach on 4096 defect reports using K nearest neighbor, Random forest, Decision tree, Support vector machine, Stochastic gradient descent and Ada boost classifiers in terms of accuracy, precision, recall and F1-score. Experimental results show that Decision tree achieves highest accuracy 98.21% while trained and tested on GloVe word embedding. We have also generated new word embedding using the bug reports corpus. Further, we compare the proposed model with Lopes et.al., 2020 and results show that our model outperforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aizawa A (2000) The feature quantity: an information theoretic perspective of tfidf-like measures. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (pp 104–111)

  • Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185

    MathSciNet  Google Scholar 

  • Al-Yousef A, Eloqayli H, Obiedat M, Almoustafa A (2021) Predicting treatment outcome of spinal musculoskeletal pain using artificial neural networks: a pilot study. Int J Med Eng Inform 13(3):237–253

    Google Scholar 

  • Amar D, Abboud S (2016) P-wave morphology in focal atrial tachycardia using a 3D numerical model of the heart. Int J Med Eng Inform 8(3):263–274

    Google Scholar 

  • Bansal B, Srivastava S (2019) Hybrid attribute based sentiment classification of online reviews for consumer intelligence. Appl Intell 49(1):137–149

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Bridge N, Miller C (1998) Orthogonal defect classification using defect data to improve software development. Softw Qual 3(1):1–8

    Google Scholar 

  • Card DN (1998) Learning from our mistakes with defect causal analysis. IEEE Softw 15(1):56–63

    Article  Google Scholar 

  • Chen YS, Chiang SW, Wu ML (2021) A few-shot transfer learning approach using text-label embedding with legal attributes for law article prediction. Appl Intell 52:1–19

    Google Scholar 

  • Chillarege R (1996) Orthogonal defect classification. Handbook of software reliability engineering, 359–399

  • Chillarege R, Bhandari IS, Chaar JK, Halliday MJ, Moebus DS, Ray BK, Wong MY (1992) Orthogonal defect classification-a concept for in-process measurements. IEEE Trans Softw Eng 18(11):943–956

    Article  Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Article  Google Scholar 

  • Duan KB, Keerthi SS (2005) Which is the best multiclass SVM method? An empirical study. In: International workshop on multiple classifier systems. Springer, Berlin, Heidelberg, (pp 278–285)

  • Endres A (1975) An analysis of errors and their causes in system programs. IEEE Trans Softw Eng 2:140–149

    Article  Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MathSciNet  Google Scholar 

  • Grady RB (1992) Practical software metrics for project management and process improvement. Prentice-Hall, Inc., Hoboken

    Google Scholar 

  • Gupta V, Mittal M (2019a) A comparison of ECG signal pre-processing using FrFT, FrWT and IPCA for improved analysis. IRBM 40(3):145–156

    Article  Google Scholar 

  • Gupta V, Mittal M (2019) QRS complex detection using STFT, chaos analysis, and PCA in standard and real-time ECG databases. J Inst Eng (India) Series B 100(5):489–497

    Article  Google Scholar 

  • Gupta V, Mittal M (2019c) R-Peak detection in ECG signal using Yule–Walker and principal component analysis. IETE J Res, 1–14

  • Gupta V, Mittal M (2020) A novel method of cardiac arrhythmia detection in electrocardiogram signal. Int J Med Eng Inform 12(5):489–499

    Google Scholar 

  • Gupta V, Mittal M (2021) R-peak detection for improved analysis in health informatics. Int J Med Eng Inform 13(3):213–223

    Google Scholar 

  • Gupta V, Mittal M, Mittal V (2019) R-peak detection using chaos analysis in standard and real time ECG databases. IRBM 40(6):341–354

    Article  Google Scholar 

  • Gupta V, Mittal M, Mittal V (2020a) Chaos theory: an emerging tool for arrhythmia detection. Sens Imaging 21(1):1–22

    Article  ADS  Google Scholar 

  • Gupta V, Mittal M, Mittal V (2020b) Performance evaluation of various pre-processing techniques for R-peak detection in ECG signal. IETE J Res, 1–16

  • Gupta V, Mittal M, Mittal V (2020c) R-peak detection based chaos analysis of ECG signal. Analog Integr Circ Sig Process 102(3):479–490

    Article  Google Scholar 

  • Gupta V, Mittal M, Mittal V (2021a) Chaos theory and ARTFA: emerging tools for interpreting ECG signals to diagnose cardiac arrhythmias. Wireless Pers Commun 118(4):3615–3646

    Article  Google Scholar 

  • Gupta V, Mittal M, Mittal V (2021b) FrWT-PPCA-based R-peak detection for improved management of healthcare system. IETE J Res, 1–15

  • Gupta V, Mittal M, Mittal V, Gupta A (2021c) ECG signal analysis using CWT, spectrogram and autoregressive technique. Iran J Comput Sci, 1–16

  • Gupta V, Mittal M, Mittal V, Gupta A (2022) An efficient AR modelling-based electrocardiogram signal analysis for health informatics. Int J Med Eng Inform 14(1):74–89

    Google Scholar 

  • Gupta V, Mittal M, Mittal V, Saxena NK (2021) A critical review of feature extraction techniques for ECG signal analysis. J Inst Eng (India) Series B 102:1–12

    Google Scholar 

  • Gupta V, Mittal M, Mittal V, Saxena NK (2021e) BP signal analysis using emerging techniques and its validation using ECG signal. Sens Imaging 22(1):1–19

    Article  CAS  Google Scholar 

  • Hernández-González J, Rodriguez D, Inza I, Harrison R, Lozano JA (2018) Learning to classify software defects from crowds: a novel approach. Appl Soft Comput 62:579–591

    Article  Google Scholar 

  • Huang L, Ng V, Persing I, Chen M, Li Z, Geng R, Tian J (2015) AutoODC: automated generation of orthogonal defect classifications. Autom Softw Eng 22(1):3–46

    Article  Google Scholar 

  • Jiang C, Xue X (2021) A uniform compact genetic algorithm for matching bibliographic ontologies. Appl Intell 51:1–16

    Article  Google Scholar 

  • Kanchinadam T, Meng Z, Bockhorst J, Singh V, Fung G (2021) Graph neural networks to predict customer satisfaction following interactions with a corporate call center. arXiv preprint arXiv:2102.00420

  • Khalil M, Ayad H, Adib A (2021) MR-brain image classification system based on SWT-LBP and ensemble of SVMs. Int J Med Eng Inform 13(2):129–142

    Google Scholar 

  • Kim S, Whitehead Jr EJ (2006) How long did it take to fix bugs?. In Proceedings of the 2006 international workshop on Mining software repositories (pp 173–174)

  • Kumar L, Kumar M, Murthy LB, Misra S, Kocher V, Padmanabhuni S (2021) An empirical study on application of word embedding techniques for prediction of software defect severity level. In: 2021 16th conference on computer science and intelligence systems (FedCSIS). IEEE. (pp 477–484)

  • Singh VB, Misra S, Sharma M (2017) Bug severity assessment in cross project context and identifying training candidates. J Inf Knowl Manag 16(01):1750005

    Article  Google Scholar 

  • Li M, Chen L, Zhao J, Li Q (2021a) Sentiment analysis of Chinese stock reviews based on BERT model. Appl Intell 51(7):5016–5024

    Article  Google Scholar 

  • Li X, Li D, Deng Y, Xing J (2021b) Intelligent mining algorithm for complex medical data based on deep learning. J Ambient Intell Humaniz Comput 12(2):1667–1678

    Article  Google Scholar 

  • Liu C, Zhao Y, Yang Y, Lu H, Zhou Y, Xu B (2015) An ast-based approach to classifying defects. In: 2015 IEEE international conference on software quality, reliability and security-companion, IEEE. (pp 14–21)

  • Lopes F, Agnelo J, Teixeira CA, Laranjeiro N, Bernardino J (2020) Automating orthogonal defect classification using machine learning algorithms. Futur Gener Comput Syst 102:932–947

    Article  Google Scholar 

  • López-Sánchez D, Herrero JR, Arrieta AG, Corchado JM (2018) Hybridizing metric learning and case-based reasoning for adaptable clickbait detection. Appl Intell 48(9):2967–2982

    Article  Google Scholar 

  • Lu X, Deng Y, Sun T, Gao Y, Feng J, Sun X, Sutcliffe R (2021) MKPM: multi keyword-pair matching for natural language sentences. Appl Intell, 1–15

  • Lyu D, Chen L, Xu Z, Yu S (2020) Weighted multi-information constrained matrix factorization for personalized travel location recommendation based on geo-tagged photos. Appl Intell 50(3):924–938

    Article  Google Scholar 

  • Ma Y, Zhang H (2021) Deep mining of communication information association based on discrete Fourier transform. J Ambient Intell Human Comput, 1–12

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  • Naser A, Tantawi M, Shedeed HA, Tolba MF (2020) Automated EEG-based epilepsy detection using BA_SVM classifiers. Int J Med Eng Inform 12(6):620–625

    Google Scholar 

  • Ng AY (2004) Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: Proceedings of the twenty-first international conference on Machine learning (p 78)

  • Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. EMNLP 14:1532–1543

    Google Scholar 

  • Pitsilis GK, Ramampiaro H, Langseth H (2018) Effective hate-speech detection in Twitter data using recurrent neural networks. Appl Intell 48(12):4730–4742

    Article  Google Scholar 

  • Rahimi Z, Homayounpour MM (2021) TensSent: a tensor based sentimental word embedding method. Appl Intell 51:1–16

    Article  Google Scholar 

  • Ramachandran SK, Manikandan P (2021) An efficient ALO-based ensemble classification algorithm for medical big data processing. Int J Med Eng Inform 13(1):54–63

    Google Scholar 

  • Sahoo S, Das P, Biswal P, Sabut S (2018) Classification of heart rhythm disorders using instructive features and artificial neural networks. Int J Med Eng Inform 10(4):359–381

    Google Scholar 

  • Seki K, Ikuta Y, Matsubayashi Y (2022) News-based business sentiment and its properties as an economic index. Inf Process Manage 59(2):102795

    Article  Google Scholar 

  • Sheela J, Janet B (2021) An abstractive summary generation system for customer reviews and news article using deep learning. J Ambient Intell Humaniz Comput 12(7):7363–7373

    Article  Google Scholar 

  • Silas S, Rajsingh EB (2019) A novel patient friendly IT enabled framework for selection of desired healthcare provider. Int J Med Eng Inform 11(1):14–40

    Google Scholar 

  • Tarapiah S, Daadoo M, Atalla S (2017) Android-based real-time healthcare system. Int J Med Eng Inform 9(3):253–268

    Google Scholar 

  • Thung F, Le XBD, Lo D (2015) Active semi-supervised defect categorization. In: 2015 IEEE 23rd international conference on program comprehension. IEEE. (pp 60–70)

  • Thung F, Lo D, Jiang L (2012) Automatic defect categorization. In: 2012 19th working conference on reverse engineering. IEEE. (pp 205–214)

  • Vo AD, Nguyen QP, Ock CY (2020) Semantic and syntactic analysis in learning representation based on a sentiment analysis model. Appl Intell 50(3):663–680

    Article  Google Scholar 

  • Wagner S (2008) Defect classification and defect types revisited. In: Proceedings of the 2008 workshop on Defects in large software systems, 39–40

  • Xie J, Li Y, Sun Q, Lin Y (2019) Enhancing sentence embedding with dynamic interaction. Appl Intell 49(9):3283–3292

    Article  Google Scholar 

  • Yue C, Cao H, Xu G, Dong Y (2021) Collaborative attention neural network for multi-domain sentiment classification. Appl Intell 51(6):3174–3188

    Article  Google Scholar 

  • Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on Machine learning (p 116)

  • Zhang X, Yao Y, Wang Y, Xu F, Lu J (2017) Exploring metadata in bug reports for bug localization. 24th Asia-Pacific software engineering conference ’APSection 2017. Nanjing, China: IEEE, Computer Society. (pp 328–337), https://doi.org/10.1109/APSection2017.39

Download references

Funding

This study was not funded by any agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. B. Singh.

Ethics declarations

Conflict of interest

The authors report no conflicts of interest.

Ethical approval

All the procedures performed in the study involving human participants are in accordance with ethical standards.

Informed consent

Informed consent was obtained from all authors included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, S., Sharma, M., Muttoo, S.K. et al. Inter project defect classification based on word embedding. Int J Syst Assur Eng Manag 15, 621–634 (2024). https://doi.org/10.1007/s13198-022-01686-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-022-01686-2

Keywords