An investigation on the effect of cross project data for prediction accuracy

Aarti; Sikka, Geeta; Dhir, Renu

doi:10.1007/s13198-016-0439-x

An investigation on the effect of cross project data for prediction accuracy

Original Article
Published: 19 March 2016

Volume 8, pages 352–377, (2017)
Cite this article

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Aarti¹,
Geeta Sikka² &
Renu Dhir²

1002 Accesses
Explore all metrics

Abstract

Quality assurance activities include software testing, verification, validation, fault proneness and fault prediction. Most of existing model predicts faults based on historical data (Within-project) through intellectual analysis. This paper instanced on defect prediction using cross-projects and mixed-project prediction because of the unpredictability in selection of software attributes by analogy based approach that deliver imprecise and ambiguous solution. Cross project delivers momentous results by focussing on selection of training data. Combination of cross project data with regression techniques improves effectiveness of prediction by extracting similar features impacted by all datasets. Feature extraction with similarity based approach creates training model that constitute more accurate prediction. We proposed two experiments with method-level and class level datasets on 23 open source projects with different scenario that shows improvement in accuracy, probability of prediction and probability of false alarm with the involvement of cross project and mixed project combinations. Major outcomes from our experiment conclude that cross project prediction provide better prediction by selecting dataset either by iterative versions of historical information or addition of limited within project data to cross project data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Amasaki S, Kawata K, Yokogawa T (2015) Improving cross-project defect prediction methods with data simplification. In: Proceedings of the 41 Euromicro conference on software engineering and advanced applications, pp 96–103
Basili V, Briand L, Melo W (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Article Google Scholar
Bettenburg N, Nagappan M, Hassan A E (2012) Think locally, act globally: Improving defect and effort prediction models. In: Proceedings of the 9th IEEE working conference on mining software repositories (MSR), Zurich, Switzerland, pp 60–69
Bhattacharya S, Rungta S, Kar N (2013) Software fault prediction using fuzzy clustering and genetic algorithm. Int J Digit Appl Contemp Res 2(5):1–7
Google Scholar
Bishnu PS, Bhattacherjee V (2012) Software fault prediction using quad tree-based K-means clustering algorithm. IEEE Trans Knowl Data Eng 24(6):1146–1150
Article Google Scholar
Briand LC, Wust J, Daly JW, Victor Poter D (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273
Article Google Scholar
Canforan G, Lucia A D, Penta M D, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: Proceedings of the sixth international conference on software testing, verification and validation (ICST), pp 252–261
Chidamber S, Kemerer C (1994) A metrics suite for object-oriented design. IEEE Trans Softw Eng 20(6):476–493
Article Google Scholar
Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661
Article Google Scholar
Hai J, Yc S W (2004) Neural network theory. Machine Industry Press
Halstead M (1977) Elements of software science. Elsevier, New York
MATH Google Scholar
http://Promisedata.org
Khoshgoftaar TM, Allen EB, Deng J (2002) Using regression trees to classify fault-prone software modules. IEEE Trans Reliab 51(4):455–462
Article Google Scholar
Liu Y, Khoshgoftaar TM, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864
Article Google Scholar
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for crosscompany software defect prediction. Inf Softw Technol 54(3):248–256
Article Google Scholar
Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 32(1):2–13
Article Google Scholar
Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423–433
Article Google Scholar
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering, China (ICSE) ACM Press, NY, pp 452–461
Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355
Article Google Scholar
Rahman F, Posnett D, Devanbu P (2012) Recalling the “imprecision” of cross-project defect prediction. In: Proceedings of the ACM-Sigsoft 20th international symposium on the foundations of software engineering (FSE-20) Research Triangle Park, NC, pp 61–71
Ryu D, In Jang J, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980
Article Google Scholar
Schroter A, Zimmermann T, Zeller A (2006) Predicting component failures at design time. In: Proceedings of the international symposium on empirical software engineering (ISESE), pp 18–27
Seliya N, Taghi E, Khoshgoftaar M (2007) Software quality estimation with limited fault data: a semi-supervised learning perspective. Softw Qual J 15(3):327–344
Article Google Scholar
Shatnawi R, Li W (2008) The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process. J Syst Softw 81(11):1868–1882
Article Google Scholar
Turhan B, Menzies T, Bener AB, Stefano JD (2009) On the relative value of crosscompany and within-company data for defect prediction. Empir Softw Eng 14(5):540–578
Article Google Scholar
Turhan B, Misirli A T, Bener A B (2011) Empirical evaluation of mixed-project defect prediction models. In: Proceedings of the 37th EUROMICRO conference on software engineering and advanced applications (SEAA), pp 396–403
Wahyudin D, Ramler D, Biffl S (2008) A framework for defect prediction in specific software project contexts. In: Proceedings of the 3rd international federation for information processing (IFIP) central and east European conference on software engineering techniques, pp 261–274
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data versus, domain versus process. In: Proceedings of the 7th joint meeting of the European software engineering conference and ACM SIGSOFT symposium on the foundations of software engineering, Netherlands, pp 91–100

Download references

Author information

Authors and Affiliations

CSE Department, DAV University, Jalandhar, India
Aarti
CSE Department, Dr. B.R. Ambedkar NIT Jalandhar, Jalandhar, India
Geeta Sikka & Renu Dhir

Authors

Aarti
View author publications
You can also search for this author in PubMed Google Scholar
Geeta Sikka
View author publications
You can also search for this author in PubMed Google Scholar
Renu Dhir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aarti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aarti, Sikka, G. & Dhir, R. An investigation on the effect of cross project data for prediction accuracy. Int J Syst Assur Eng Manag 8, 352–377 (2017). https://doi.org/10.1007/s13198-016-0439-x

Download citation

Received: 27 September 2015
Revised: 04 January 2016
Published: 19 March 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s13198-016-0439-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An investigation on the effect of cross project data for prediction accuracy

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cross project defect prediction for open source software

Empirical validation of feature selection techniques for cross-project defect prediction

Cross project defect prediction: a comprehensive survey with its SWOT analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An investigation on the effect of cross project data for prediction accuracy

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cross project defect prediction for open source software

Empirical validation of feature selection techniques for cross-project defect prediction

Cross project defect prediction: a comprehensive survey with its SWOT analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation