research-article

An exploratory study on applicability of cross project defect prediction approaches to cross-company effort estimation

Authors:

Sousuke Amasaki,

Tomoyuki YokogawaAuthors Info & Claims

PROMISE 2020: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering

Pages 71 - 80

https://doi.org/10.1145/3416508.3417118

Published: 08 November 2020 Publication History

Abstract

BACKGROUND: Research on software effort estimation has been active for decades, especially in developing effort estimation models. Effort estimation models need a dataset collected from completed projects similar to a project to be estimated. The similarity suffers from dataset shift, and cross-company software effort estimation (CCSEE) gets an attractive research topic. A recent study on the dataset shift problem examined the applicability and the effectiveness of cross-project defect prediction (CPDP) approaches. It was insufficient to bring a conclusion due to a limited number of examined approaches. AIMS: To investigate the characteristics of CPDP approaches that are applicable and effective for dataset shift problem in effort estimation. METHOD: We first reviewed the characteristics of 24 CPDP approaches to find applicable approaches. Next, we investigated their effectiveness in effort estimation performance with ten dataset configurations. RESULTS: 16 out of 24 CPDP approaches implemented in CrossPare framework were found to be applicable to CCSEE. However, only one approach could improve the effort estimation performance. Most of the others degraded it and were harmful. CONCLUSIONS: Most of the CPDP approaches we examined were helpless for CCSEE.

References

[1]

Sousuke Amasaki, Kazuya Kawata, and Tomoyuki Yokogawa. 2015. Improving Cross-Project Defect Prediction Methods with Data Simplification. In Proc. of SEAA '15. IEEE, 96-103.

Digital Library

[2]

Sousuke Amasaki, Tomoyuki Yokogawa, and Hirohisa Aman. 2019. Applying Cross Project Defect Prediction Approaches to Cross-Company Efort Estimation. In Proc. of PROMISE '19. ACM, 76-79.

Digital Library

[3]

L. C. Briand, T. Langley, and I. Wieczorek. 2000. A replicated assessment and comparison of common software cost modeling techniques. In Proc. of ICSE. IEEE, 377-386.

[4]

Gerardo Canfora, Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, Annibale Panichella, and Sebastiano Panichella. 2013. Multi-objective Cross-Project Defect Prediction. In Proc. of ICST '13. IEEE, 252-261.

Digital Library

[5]

Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7 ( 2006 ), 1-30.

[6]

Camargo-Cruz Ana Erika and Koichiro Ochimizu. 2009. Towards logistic regression models for predicting fault-prone code across software projects. In Proc. of ESEM '09. IEEE, 460-463.

[7]

Filomena Ferrucci and Carmine Gravino. 2019. Can Expert Opinion Improve Efort Predictions When Exploiting Cross-Company Datasets ?-A Case Study in a Small/Medium Company. In Proc. of Product-Focused Software Process Improvement. Springer, 280-295.

[8]

Peng He, Bing Li, Xiao Liu, Jun Chen, and Yutao Ma. 2015. An empirical study on software defect prediction with a simplified metric set. Information and Software Technology 59 ( 2015 ), 170-190.

[9]

Zhimin He, F Peters, Tim Menzies, and Ye Yang. 2013. Learning from OpenSource Projects: An Empirical Study on Defect Prediction. In Proc. of ESEM '13. IEEE, 45-54.

[10]

Stefen Herbold. 2013. Training data selection for cross-project defect prediction. In Proc. of PROMISE '13. ACM, New York, New York, USA, 6 : 1-6 : 10.

[11]

Stefen Herbold, Alexander Trautsch, and Jens Grabowski. 2018. A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches. IEEE Transactions on Software Engineering 44, 9 ( 2018 ), 811-833.

[12]

Mohamed Hosni, Ali Idri, Alain Abran, and Ali Bou Nassif. 2018. On the value of parameter tuning in heterogeneous ensembles efort estimation. Soft Computing 22, 18 ( 2018 ), 5977-6010.

[13]

LiGuo Huang, Daniel Port, Liang Wang, Tao Xie, and Tim Menzies. 2010. Text Mining in Supporting Software Systems Risk Assurance. In Proc. of International Conference on Automated Software Engineering (ASE '10). ACM, 163-166.

Digital Library

[14]

X. Jing, F. Qi, F. Wu, and B. Xu. 2016. Missing Data Imputation Based on LowRank Recovery and Semi-Supervised Regression for Software Efort Estimation. In Proc. of ICSE. IEEE, 607-618.

[15]

Xiaoyuan Jing, Fei Wu, Xiwei Dong, Fumin Qi, and Baowen Xu. 2015. Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In Proc. of FSE '15. ACM, 496-507.

Digital Library

[16]

Kazuya Kawata, Sousuke Amasaki, and Tomoyuki Yokogawa. 2015. Improving relevancy filter methods for cross-project defect prediction. In Proc. of ACIT-CSI '15. Springer, 2-7.

Digital Library

[17]

Taghi M. Khoshgoftaar, Pierre Rebours, and Naeem Seliya. 2009. Software quality analysis by combining multiple projects and learners. Software Quality Journal 17, 1 ( 2009 ), 25-49.

Digital Library

[18]

Barbara A. Kitchenham, Emilia Mendes, and Guilherme Horta Travassos. 2007. Cross versus Within-Company Cost Estimation Studies: A Systematic Review. IEEE Transactions on Software Engineering 33, 5 ( 2007 ), 316-329.

[19]

Ekrem Kocaguneli, Bojan Cukic, Tim Menzies, and Huihua Lu. 2013. Building a second opinion: learning cross-company data. In Proc. of ESEM '13. ACM, 1-10.

Digital Library

[20]

Ekrem Kocaguneli and Tim Menzies. 2011. How to Find Relevant Data for Efort Estimation?. In Proc. of ESEM '11. IEEE, 255-264.

[21]

Ekrem Kocaguneli, Tim Menzies, Jacky Keung, David Cok, and Ray Madachy. 2013. Active learning and efort estimation: Finding the essential content of software efort estimation data. IEEE Transactions on Software Engineering 39, 8 ( 2013 ), 1040-1053.

Digital Library

[22]

Ekrem Kocaguneli, Tim Menzies, and Emilia Mendes. 2015. Transfer learning in efort estimation. Empirical Software Engineering 20, 3 ( 2015 ), 813-843.

[23]

William B. Langdon, Javier Dolado, Federica Sarro, and Mark Harman. 2016. Exact Mean Absolute Error of Baseline Predictor, MARP0. Information and Software Technology 73 ( 2016 ), 16-18.

[24]

Yi Liu, Taghi M. Khoshgoftaar, and Naeem Seliya. 2010. Evolutionary Optimization of Software Quality Modeling with Multiple Repositories. IEEE Transactions on Software Engineering 36, 6 ( 2010 ), 852-864.

Digital Library

[25]

Ying Ma, Guangchun Luo, Xue Zeng, and Aiguo Chen. 2012. Transfer learning for cross-company software defect prediction. Information and Software Technology 54, 3 ( 2012 ), 248-256.

[26]

Emilia Mendes and Chris Lokan. 2009. Investigating the use of chronological splitting to compare software cross-company and single-company efort predictions: a replicated study. In Proc. of EASE. ACM, 11-20.

[27]

Solomon Mensah, Jacky Keung, Michael Franklin Bosu, and Kwabena Ebo Bennin. 2018. Duplex output software efort estimation model with self-guided interpretation. Information and Software Technology 94 ( 2018 ), 1-13.

[28]

Tim Menzies, A. Butcher, A. Marcus, Thomas Zimmermann, and D. Cok. 2011. Local versus global models for efort estimation and defect prediction. In Proc. of ASE '11. IEEE, 343-351.

[29]

Tim Menzies, Zhihao Chen, Jairus Hihn, and Karen Lum. 2006. Selecting best practices for efort estimation. IEEE Transactions on Software Engineering 32, 11 ( 2006 ), 883-895.

Digital Library

[30]

Leandro L Minku, David Bowes, Emad Shihab, and Burak Turhan. 2019. A novel online supervised hyperparameter tuning procedure applied to cross-company software efort estimation. Empirical Software Engineering 24 ( 2019 ), 3153-3204.

[31]

Leandro L. Minku and Siqing Hou. 2017. Clustering Dycom. In Proc. of PROMISE '17. ACM, 12-21.

[32]

Leandro L Minku and Xin Yao. 2012. Can Cross-company Data Improve Performance in Software Efort Estimation?. In Proc. of PROMISE '12. ACM, 69-78.

Digital Library

[33]

Leandro L Minku and Xin Yao. 2014. How to make best use of cross-company data in software efort estimation?. In Proc. of ICSE. ACM, 446-456.

[34]

Leandro L Minku and Xin Yao. 2017. Which models of the past are relevant to the present? A software efort estimation approach to exploiting useful past models. Automated Software Engineering 24, 3 ( 2017 ), 499-542.

[35]

Jaechang Nam and Sunghun Kim. 2015. CLAMI: Defect Prediction on Unlabeled Datasets (T). In Proc. of ASE '15. IEEE, 452-463.

Digital Library

[36]

Jaechang Nam, S J Pan, and Sunghun Kim. 2013. Transfer defect learning. In Proc. of ICSE '13. IEEE, 382-391.

[37]

Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 ( 2010 ), 1345-1359.

Digital Library

[38]

A Panichella, Rocco Oliveto, and Andrea De Lucia. 2014. Cross-project defect prediction models: L'Union fait la force. In Proc. of CSMR-WCRE '14. IEEE, 164-173.

[39]

Fayola Peters and Tim Menzies. 2012. Privacy and utility for defect prediction: experiments with MORPH. In Proc. of ICSE '12. IEEE, 189-199.

[40]

Fayola Peters, Tim Menzies, Liang Gong, and Hongyu Zhang. 2013. Balancing Privacy and Utility in Cross-Company Defect Prediction. IEEE Transactions on Software Engineering 39, 8 ( 2013 ), 1054-1068.

Digital Library

[41]

Fayola Peters, Tim Menzies, and Lucas Layman. 2015. LACE2: Better PrivacyPreserving Data Sharing for Cross Project Defect Prediction. In Proc. of ICSE '15. IEEE, 801-811.

[42]

Passakorn Phannachitta, Jacky Keung, Akito Monden, and Kenichi Matsumoto. 2017. A stability assessment of solution adaptation techniques for analogy-based software efort estimation. Empirical Software Engineering 22, 1 ( 2017 ), 474-504.

[43]

Przemyslaw Pospieszny, Beata Czarnacka-Chrobot, and Andrzej Kobylinski. 2018. An efective approach for software project efort and duration estimation with machine learning algorithms. The Journal of Systems & Software 137 ( 2018 ), 184-196.

[44]

Duksan Ryu, Okjoo Choi, and Jongmoon Baik. 2014. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering 21, 1 ( 2014 ), 1-29.

[45]

Duksan Ryu, Jong-In Jang, and Jongmoon Baik. 2015. A hybrid instance selection using nearest-neighbor for cross-project defect prediction. Journal of Computer Science and Technology 30, 5 ( 2015 ), 969-980.

[46]

Federica Sarro, Alessio Petrozziello, and Mark Harman. 2016. Multi-objective software efort estimation. In Proc. of ICSE. ACM, 619-630.

[47]

Sumeet Kaur Sehra, Yadwinder Singh Brar, Navdeep Kaur, and Sukhjit Singh Sehra. 2017. Research patterns and trends in software efort estimation. Information and Software Technology 91 ( 2017 ), 1-21.

[48]

Martin Shepperd and Chris Schofield. 1997. Estimating software project efort using analogies. IEEE Transactions on Software Engineering 23, 11 ( 1997 ), 736-743.

Digital Library

[49]

Martin J Shepperd and Steve MacDonell. 2012. Evaluating prediction systems in software project estimation. Information and Software Technology 54, 8 ( 2012 ), 820-827.

[50]

Boyce Sigweni, Martin Shepperd, and Tommaso Turchi. 2016. Realistic assessment of software efort estimation models. In Proc. of EASE '16. ACM, 6.

Digital Library

[51]

Shensi Tong, Qing He, Yuting Chen, Ye Yang, and Beijun Shen. 2016. Heterogeneous Cross-Company Efort Estimation through Transfer Learning. In Proc. of APSEC '16. IEEE, 169-176.

[52]

Burak Turhan. 2012. On the dataset shift problem in software engineering prediction models. Empirical Software Engineering 17, 1-2 ( 2012 ), 62-74.

Digital Library

[53]

Burak Turhan and Emilia Mendes. 2014. A Comparison of Cross-Versus SingleCompany Efort Prediction Models for Web Projects. In Proc. of SEAA '14. IEEE, 285-292.

[54]

Burak Turhan, Tim Menzies, Ayşe B Bener, and Justin Di Stefano. 2009. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering 14, 5 ( 2009 ), 540-578.

[55]

Satoshi Uchigaki, Shinji Uchida, Koji Toda, and Akito Monden. 2012. An Ensemble Approach of Simple Regression Models to Cross-Project Fault Prediction. In Proc. of SNPD '12. IEEE, 476-481.

Digital Library

[56]

Shinya Watanabe, Haruhiko Kaiya, and Kenji Kaijiri. 2008. Adapting a fault prediction model to allow inter languagereuse. In Proc. of PROMISE '08. ACM, New York, New York, USA, 19-24.

Digital Library

[57]

Yun Zhang, David Lo, Xin Xia, and Jianling Sun. 2015. An Empirical Study of Classifier Combination for Cross-Project Defect Prediction. In Proc. of COMPSAC '15. IEEE, 264-269.

Digital Library

[58]

Yuming Zhou, Yibiao Yang, Hongmin Lu, Lin Chen, Yanhui Li, Yangyang Zhao, Junyan Qian, and Baowen Xu. 2018. How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction. ACM Transactions on Software Engineering and Methodology 27, 1 ( 2018 ), 1-51.

Digital Library

[59]

Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. 2009. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In Proc. of ESEC/FSE '09. ACM, 91-100.

Digital Library

Cited By

Amasaki SAman HYokogawa T(2022)An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situationEmpirical Software Engineering10.1007/s10664-021-10103-427:2Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1007/s10664-021-10103-4

Index Terms

An exploratory study on applicability of cross project defect prediction approaches to cross-company effort estimation
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. Implementation management
        Pricing and resource allocation
      2. Project and people management

Recommendations

Applying Cross Project Defect Prediction Approaches to Cross-Company Effort Estimation
PROMISE'19: Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering

BACKGROUND: Prediction systems in software engineering often suffer from the shortage of suitable data within a project. A promising solution is transfer learning that utilizes data from outside the project. Many transfer learning approaches have been ...
Towards Better Effort Estimation with Cross-Project Defect Prediction Approaches
EASE '19: Proceedings of the 23rd International Conference on Evaluation and Assessment in Software Engineering

This research aims to tackle a data shift problem of software effort estimation. Cross project defect prediction approaches were found to be helpful for the same problem of software defect prediction. We examined the CPDP approaches and explored its ...
An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situation
Abstract
Software effort estimation (SEE) models have been studied for decades. One of serious but typical situations for data-oriented models is the availability of datasets for training models. Cross-company software effort estimation (CCSEE) is a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PROMISE 2020: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering

November 2020

80 pages

ISBN:9781450381277

DOI:10.1145/3416508

General Chair:
Leandro Minku,
Program Chairs:
Tim Menzies,
Mei Nagappan

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Japan Society for the Promotion of Science

Conference

PROMISE '20

Sponsor:

SIGSOFT

PROMISE '20: 16th International Conference on Predictive Models and Data Analytics in Software Engineering

November 8 - 9, 2020

Virtual, USA

Acceptance Rates

Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
86
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Amasaki SAman HYokogawa T(2022)An extended study on applicability and performance of homogeneous cross-project defect prediction approaches under homogeneous cross-company effort estimation situationEmpirical Software Engineering10.1007/s10664-021-10103-427:2Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1007/s10664-021-10103-4

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents