Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

An investigation on the effect of cross project data for prediction accuracy

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

Quality assurance activities include software testing, verification, validation, fault proneness and fault prediction. Most of existing model predicts faults based on historical data (Within-project) through intellectual analysis. This paper instanced on defect prediction using cross-projects and mixed-project prediction because of the unpredictability in selection of software attributes by analogy based approach that deliver imprecise and ambiguous solution. Cross project delivers momentous results by focussing on selection of training data. Combination of cross project data with regression techniques improves effectiveness of prediction by extracting similar features impacted by all datasets. Feature extraction with similarity based approach creates training model that constitute more accurate prediction. We proposed two experiments with method-level and class level datasets on 23 open source projects with different scenario that shows improvement in accuracy, probability of prediction and probability of false alarm with the involvement of cross project and mixed project combinations. Major outcomes from our experiment conclude that cross project prediction provide better prediction by selecting dataset either by iterative versions of historical information or addition of limited within project data to cross project data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Amasaki S, Kawata K, Yokogawa T (2015) Improving cross-project defect prediction methods with data simplification. In: Proceedings of the 41 Euromicro conference on software engineering and advanced applications, pp 96–103

  • Basili V, Briand L, Melo W (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

    Article  Google Scholar 

  • Bettenburg N, Nagappan M, Hassan A E (2012) Think locally, act globally: Improving defect and effort prediction models. In: Proceedings of the 9th IEEE working conference on mining software repositories (MSR), Zurich, Switzerland, pp 60–69

  • Bhattacharya S, Rungta S, Kar N (2013) Software fault prediction using fuzzy clustering and genetic algorithm. Int J Digit Appl Contemp Res 2(5):1–7

    Google Scholar 

  • Bishnu PS, Bhattacherjee V (2012) Software fault prediction using quad tree-based K-means clustering algorithm. IEEE Trans Knowl Data Eng 24(6):1146–1150

    Article  Google Scholar 

  • Briand LC, Wust J, Daly JW, Victor Poter D (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273

    Article  Google Scholar 

  • Canforan G, Lucia A D, Penta M D, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: Proceedings of the sixth international conference on software testing, verification and validation (ICST), pp 252–261

  • Chidamber S, Kemerer C (1994) A metrics suite for object-oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

    Article  Google Scholar 

  • Hai J, Yc S W (2004) Neural network theory. Machine Industry Press

  • Halstead M (1977) Elements of software science. Elsevier, New York

    MATH  Google Scholar 

  • http://Promisedata.org

  • Khoshgoftaar TM, Allen EB, Deng J (2002) Using regression trees to classify fault-prone software modules. IEEE Trans Reliab 51(4):455–462

    Article  Google Scholar 

  • Liu Y, Khoshgoftaar TM, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864

    Article  Google Scholar 

  • Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for crosscompany software defect prediction. Inf Softw Technol 54(3):248–256

    Article  Google Scholar 

  • Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 32(1):2–13

    Article  Google Scholar 

  • Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423–433

    Article  Google Scholar 

  • Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering, China (ICSE) ACM Press, NY, pp 452–461

  • Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355

    Article  Google Scholar 

  • Rahman F, Posnett D, Devanbu P (2012) Recalling the “imprecision” of cross-project defect prediction. In: Proceedings of the ACM-Sigsoft 20th international symposium on the foundations of software engineering (FSE-20) Research Triangle Park, NC, pp 61–71

  • Ryu D, In Jang J, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980

    Article  Google Scholar 

  • Schroter A, Zimmermann T, Zeller A (2006) Predicting component failures at design time. In: Proceedings of the international symposium on empirical software engineering (ISESE), pp 18–27

  • Seliya N, Taghi E, Khoshgoftaar M (2007) Software quality estimation with limited fault data: a semi-supervised learning perspective. Softw Qual J 15(3):327–344

    Article  Google Scholar 

  • Shatnawi R, Li W (2008) The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process. J Syst Softw 81(11):1868–1882

    Article  Google Scholar 

  • Turhan B, Menzies T, Bener AB, Stefano JD (2009) On the relative value of crosscompany and within-company data for defect prediction. Empir Softw Eng 14(5):540–578

    Article  Google Scholar 

  • Turhan B, Misirli A T, Bener A B (2011) Empirical evaluation of mixed-project defect prediction models. In: Proceedings of the 37th EUROMICRO conference on software engineering and advanced applications (SEAA), pp 396–403

  • Wahyudin D, Ramler D, Biffl S (2008) A framework for defect prediction in specific software project contexts. In: Proceedings of the 3rd international federation for information processing (IFIP) central and east European conference on software engineering techniques, pp 261–274

  • Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data versus, domain versus process. In: Proceedings of the 7th joint meeting of the European software engineering conference and ACM SIGSOFT symposium on the foundations of software engineering, Netherlands, pp 91–100

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aarti.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aarti, Sikka, G. & Dhir, R. An investigation on the effect of cross project data for prediction accuracy. Int J Syst Assur Eng Manag 8, 352–377 (2017). https://doi.org/10.1007/s13198-016-0439-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-016-0439-x

Keywords