research-article

Assessing the Early Bird Heuristic (for Predicting Project Quality)

Authors:

Shrikanth N. C.,

Tim MenziesAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 32, Issue 5

Article No.: 116, Pages 1 - 39

https://doi.org/10.1145/3583565

Published: 24 July 2023 Publication History

Abstract

Before researchers rush to reason across all available data or try complex methods, perhaps it is prudent to first check for simpler alternatives. Specifically, if the historical data has the most information in some small region, then perhaps a model learned from that region would suffice for the rest of the project.

To support this claim, we offer a case study with 240 projects, where we find that the information in those projects “clumps” towards the earliest parts of the project. A quality prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this “early bird” data, we can build models very quickly and very early in the project life cycle. Moreover, using this early bird method, we have shown that a simple model (with just a few features) generalizes to hundreds of projects.

Based on this experience, we doubt that prior work on generalizing quality models may have needlessly complicated an inherently simple process. Further, prior work that focused on later-life cycle data needs to be revisited, since their conclusions were drawn from relatively uninformative regions.

Replication note: All our data and scripts are available here: https://github.com/snaraya7/early-bird.

References

[1]

Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim. 2001. On the surprising behavior of distance metrics in high dimensional spaces. In Proceedings of the 8th International Conference on Database Theory (ICDT’01). Springer-Verlag, Berlin, 420–434.

[2]

Amritanshu Agrawal, Wei Fu, Di Chen, Xipeng Shen, and Tim Menzies. 2019. How to “DODGE” complex software analytics. IEEE Trans. Softw. Eng. 47, 10 (2019), 2182–2194.

[3]

Amritanshu Agrawal, Wei Fu, and Tim Menzies. 2018. What is wrong with topic modeling? And how to fix it using search-based software engineering. Inf. Softw. Technol. 98 (2018), 74–88.

Digital Library

[4]

Amritanshu Agrawal and Tim Menzies. 2018. Is “better data” better than “better data miners”? In Proceedings of the IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 1050–1061.

Digital Library

[5]

Amritanshu Agrawal, Xueqi Yang, Rishabh Agrawal, Rahul Yedida, Xipeng Shen, and Tim Menzies. 2021. Simpler hyperparameter optimization for software analytics: Why, how, when. IEEE Trans. Softw. Eng. 48, 8 (2021), 2939–2954.

[6]

Fumio Akiyama. 1971. An example of software system debugging. In Information Processing, Proceedings of IFIP Congress 1971, Volume 1 - Foundations and Systems, Ljubljana, Yugoslavia, August 23-28, 1971, Charles V. Freiman, John E. Griffith, and Jack L. Rosenfeld (Eds.). North-Holland, 353–359.

[7]

Sousuke Amasaki. 2020. Cross-version defect prediction: Use historical data, cross-project data, or both? Empir. Softw. Eng. 25, 2 (2020), 1573–1595.

[8]

J. Arokiam and J. Bradbury. 2020. Automatically predicting bug severity early in the development process. In Proceedings of the 42nd International Conference on Software Engineering (ICSE).

Digital Library

[9]

Kwabena Ebo Bennin, Jacky W. Keung, and Akito Monden. 2019. On the relative value of data resampling approaches for software defect prediction. Empir. Softw. Eng. 24, 2 (2019), 602–636.

Digital Library

[10]

James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS’11). Curran Associates Inc., 2546–2554. Retrieved from http://dl.acm.org/citation.cfm?id=2986459.2986743.

Digital Library

[11]

James Bergstra, Daniel Yamins, and David Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the International Conference on Machine Learning. PMLR, 115–123.

[12]

Nicolas Bettenburg, Meiyappan Nagappan, and Ahmed E. Hassan. 2012. Think locally, act globally: Improving defect and effort prediction models. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, 60–69.

[13]

Christian Bird, Tim Menzies, and Thomas Zimmermann. 2015. The Art and Science of Analyzing Software Data (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA.

Digital Library

[14]

Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. 2011. Don’t touch my code! Examining the effects of ownership on software quality. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 4–14.

Digital Library

[15]

Barry Boehm, Ricardo Valerdi, J. Lane, and A. W. Brown. 2005. COCOMO suite methodology and evolution. CrossTalk 18, 4 (2005), 20–25.

[16]

David Bowes, Tracy Hall, and Jean Petrić. 2018. Software defect prediction: Do different classifiers find the same defects? Softw. Qual. J. 26, 2 (2018), 525–552.

Digital Library

[17]

Lionel C. Briand, Walcelio L. Melo, and Jurgen Wust. 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28, 7 (2002), 706–720.

Digital Library

[18]

Gerardo Canfora, Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, Annibale Panichella, and Sebastiano Panichella. 2013. Multi-objective cross-project defect prediction. In Proceedings of the IEEE 6th International Conference on Software Testing, Verification and Validation. IEEE, 252–261.

Digital Library

[19]

Gerardo Canfora, Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, Annibale Panichella, and Sebastiano Panichella. 2015. Defect prediction as a multiobjective optimization problem. Softw. Test. Verif. Reliab. 25, 4 (2015), 426–459.

Digital Library

[20]

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 (2002), 321–357.

[21]

Thierry Titcheu Chekam, Mike Papadakis, Tegawendé F. Bissyandé, Yves Le Traon, and Koushik Sen. 2020. Selecting fault revealing mutants. Empir. Softw. Eng. 25, 1 (2020), 434–487.

Digital Library

[22]

Di Chen, Wei Fu, Rahul Krishna, and Tim Menzies. 2018. Applications of psychological science for actionable analytics. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.

[23]

Jinyin Chen, Yitao Yang, Keke Hu, Qi Xuan, Yi Liu, and Chao Yang. 2019. Multiview transfer learning for software defect prediction. IEEE Access 7 (2019), 8901–8916.

[24]

Lin Chen, Bin Fang, Zhaowei Shang, and Yuanyan Tang. 2015. Negative samples reduction in cross-company software defects prediction. Inf. Softw. Technol. 62 (2015), 67–77.

Digital Library

[25]

Xiang Chen, Dun Zhang, Yingquan Zhao, Zhanqi Cui, and Chao Ni. 2019. Software defect number prediction: Unsupervised vs supervised methods. Inf. Softw. Technol. 106 (2019), 161–181.

[26]

Xiang Chen, Yingquan Zhao, Qiuping Wang, and Zhidan Yuan. 2018. MULTI: Multi-objective effort-aware just-in-time software defect prediction. Inf. Softw. Technol. 93 (2018), 1–13.

Digital Library

[27]

Hoa Khanh Dam, Trang Pham, Shien Wee Ng, Truyen Tran, John Grundy, Aditya Ghose, Taeksu Kim, and Chul-Joo Kim. 2018. A deep tree-based model for software defect prediction. arXiv preprint arXiv:1802.00921 (2018).

[28]

Marco D’Ambros, Michele Lanza, and Romain Robbes. 2012. Evaluating defect prediction approaches: A benchmark and an extensive comparison. Empir. Softw. Eng. 17, 4-5 (2012), 531–577.

Digital Library

[29]

Norman Fenton, Martin Neil, William Marsh, Peter Hearty, Łukasz Radliński, and Paul Krause. 2008. On the effectiveness of early life cycle defect prediction with Bayesian nets. Empir. Softw. Eng. 13, 5 (2008), 499.

Digital Library

[30]

Wei Fu, Tim Menzies, and Xipeng Shen. 2016. Tuning for software analytics: Is it really necessary? Inf. Softw. Technol. 76 (2016), 135–146.

Digital Library

[31]

Takafumi Fukushima, Yasutaka Kamei, Shane McIntosh, Kazuhiro Yamashita, and Naoyasu Ubayashi. 2014. An empirical study of just-in-time defect prediction using cross-project models. In Proceedings of the 11th Working Conference on Mining Software Repositories. 172–181.

Digital Library

[32]

Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In Proceedings of the IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 789–800.

[33]

Brent Hailpern and Padmanabhan Santhanam. 2002. Software debugging, testing, and verification. IBM Syst. J. 41, 1 (2002), 4–12.

Digital Library

[34]

Mark A. Hall and Geoffrey Holmes. 2003. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Eng. 15, 6 (2003), 1437–1447.

Digital Library

[35]

A. Hassan. 2017. Remarks made during a presentation to the UCL Crest Open Workshop. (March2017).

[36]

Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 9 (2009), 1263–1284.

Digital Library

[37]

Peng He, Bing Li, Xiao Liu, Jun Chen, and Yutao Ma. 2015. An empirical study on software defect prediction with a simplified metric set. Inf. Softw. Technol. 59 (2015), 170–190.

Digital Library

[38]

Zhimin He, Fayola Peters, Tim Menzies, and Ye Yang. 2013. Learning from open-source projects: An empirical study on defect prediction. In Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, 45–54.

[39]

Zhimin He, Fengdi Shu, Ye Yang, Mingshu Li, and Qing Wang. 2012. An investigation on the feasibility of cross-project defect prediction. Autom. Softw. Eng. 19, 2 (2012), 167–199.

Digital Library

[40]

Steffen Herbold. 2013. Training data selection for cross-project defect prediction. In Proceedings of the 9th International Conference on Predictive Models in Software Engineering. 1–10.

Digital Library

[41]

T. Hoang, H. Khanh Dam, Y. Kamei, D. Lo, and N. Ubayashi. 2019. DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction. In Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 34–45.

Digital Library

[42]

Seyedrebvar Hosseini, Burak Turhan, and Mika Mäntylä. 2018. A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf. Softw. Technol. 95 (2018), 296–312.

[43]

Qiao Huang, Xin Xia, and David Lo. 2019. Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction. Empir. Softw. Eng. 24, 5 (2019), 2823–2862.

Digital Library

[44]

Shamsul Huda, Kevin Liu, Mohamed Abdelrazek, Amani Ibrahim, Sultan Alyahya, Hmood Al-Dossari, and Shafiq Ahmad. 2018. An ensemble oversampling model for class imbalance problem in software defect prediction. IEEE Access 6 (2018), 24184–24195.

[45]

Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 279–289.

Digital Library

[46]

Xiaoyuan Jing, Fei Wu, Xiwei Dong, Fumin Qi, and Baowen Xu. 2015. Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. 496–507.

Digital Library

[47]

Xiao-Yuan Jing, Fei Wu, Xiwei Dong, and Baowen Xu. 2016. An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans. Softw. Eng. 43, 4 (2016), 321–339.

Digital Library

[48]

Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E. Hassan. 2016. Studying just-in-time defect prediction using cross-project models. Empir. Softw. Eng. 21, 5 (2016), 2072–2106.

Digital Library

[49]

Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2012. A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39, 6 (2012), 757–773.

Digital Library

[50]

Miryung Kim, Dongxiang Cai, and Sunghun Kim. 2011. An empirical investigation into the role of API-level refactorings during software evolution. In Proceedings of the 33rd International Conference on Software Engineering. ACM, 151–160.

Digital Library

[51]

Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The emerging role of data scientists on software development teams. In Proceedings of the 38th International Conference on Software Engineering (ICSE’16). ACM, New York, NY, 96–107. DOI:DOI:

Digital Library

[52]

Ekrem Kocaguneli, Tim Menzies, and Emilia Mendes. 2015. Transfer learning in effort estimation. Empir. Softw. Eng. 20, 3 (June2015), 813–843. DOI:DOI:

Digital Library

[53]

Masanari Kondo, Cor-Paul Bezemer, Yasutaka Kamei, Ahmed E. Hassan, and Osamu Mizuno. 2019. The impact of feature reduction techniques on defect prediction models. Empir. Softw. Eng. 24, 4 (2019), 1925–1963.

Digital Library

[54]

Masanari Kondo, Daniel M. German, Osamu Mizuno, and Eun-Hye Choi. 2020. The impact of context metrics on just-in-time defect prediction. Empir. Softw. Eng. 25, 1 (2020), 890–939.

Digital Library

[55]

A. Güneş Koru, Khaled El Emam, Dongsong Zhang, Hongfang Liu, and Divya Mathew. 2008. Theory of relative defect proneness. Empir. Softw. Eng. 13, 5 (2008), 473–498.

Digital Library

[56]

A. Güneş Koru, Dongsong Zhang, Khaled El Emam, and Hongfang Liu. 2008. An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans. Softw. Eng. 35, 2 (2008), 293–304.

Digital Library

[57]

Rahul Krishna and Tim Menzies. 2018. Bellwethers: A baseline method for transfer learning. IEEE Trans. Softw. Eng. 45, 11 (2018), 1081–1105.

[58]

Rahul Krishna, Tim Menzies, and Wei Fu. 2016. Too much automation? The bellwether effect and its implications for transfer learning. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 122–131.

Digital Library

[59]

An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2017. Bug localization with combination of deep learning and information retrieval. In Proceedings of the IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE, 218–229.

Digital Library

[60]

Issam H. Laradji, Mohammad Alshayeb, and Lahouari Ghouti. 2015. Software defect prediction using ensemble learning on selected features. Inf. Softw. Technol. 58 (2015), 388–402.

[61]

Elizaveta Levina and Peter J. Bickel. 2005. Maximum likelihood estimation of intrinsic dimension. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 777–784.

[62]

Jian Li, Pinjia He, Jieming Zhu, and Michael R. Lyu. 2017. Software defect prediction via convolutional neural network. In Proceedings of the IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE, 318–328.

[63]

Ming Li, Hongyu Zhang, Rongxin Wu, and Zhi-Hua Zhou. 2012. Sample-based software defect prediction with active and semi-supervised learning. Autom. Softw. Eng. 19, 2 (2012), 201–230.

Digital Library

[64]

Zhiqiang Li, Xiao-Yuan Jing, Fei Wu, Xiaoke Zhu, Baowen Xu, and Shi Ying. 2018. Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction. Autom. Softw. Eng. 25, 2 (2018), 201–245.

Digital Library

[65]

Chao Liu, Dan Yang, Xin Xia, Meng Yan, and Xiaohong Zhang. 2019. A two-phase transfer learning model for cross-project defect prediction. Inf. Softw. Technol. 107 (2019), 125–136.

[66]

Ying Ma, Guangchun Luo, Xue Zeng, and Aiguo Chen. 2012. Transfer learning for cross-company software defect prediction. Inf. Softw. Technol. 54, 3 (2012), 248–256.

Digital Library

[67]

George Mathew, Amritanshu Agrawal, and Tim Menzies. 2018. Finding trends in software research. IEEE Trans. Softw. Eng. (2018), 1–1.

Digital Library

[68]

Shane McIntosh and Yasutaka Kamei. 2017. Are fix-inducing changes a moving target? A longitudinal case study of just-in-time defect prediction. IEEE Trans. Softw. Eng. 44, 5 (2017), 412–428.

[69]

Tim Menzies, Andrew Butcher, David Cok, Andrian Marcus, Lucas Layman, Forrest Shull, Burak Turhan, and Thomas Zimmermann. 2012. Local versus global lessons for defect prediction and effort estimation. IEEE Trans. Softw. Eng. 39, 6 (2012), 822–834.

Digital Library

[70]

Tim Menzies, Andrew Butcher, Andrian Marcus, Thomas Zimmermann, and David Cok. 2011. Local vs. global models for effort estimation and defect prediction. In Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE’11). IEEE, 343–351.

Digital Library

[71]

Tim Menzies, Jeremy Greenwald, and Art Frank. 2006. Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33, 1 (2006), 2–13.

[72]

Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayşe Bener. 2010. Defect prediction from static code features: Current results, limitations, new approaches. Autom. Softw. Eng. 17, 4 (2010), 375–407.

Digital Library

[73]

Tim Menzies, Burak Turhan, Ayse Bener, Gregory Gay, Bojan Cukic, and Yue Jiang. 2008. Implications of ceiling effects in defect predictors. In Proceedings of the 4th International Workshop on Predictor Models in Software Engineering. 47–54.

Digital Library

[74]

Ayse Tosun Misirli, Ayse Bener, and Resat Kale. 2011. AI-based software defect predictors: Applications and benefits in a case study. AI Mag. 32, 2 (2011), 57–68.

Digital Library

[75]

N. Mittas and L. Angelis. 2013. Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans. Softw. Eng. 39, 4 (Apr.2013), 537–551. DOI:DOI:

Digital Library

[76]

Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating github for engineered software projects. Empir. Softw. Eng. 22, 6 (2017), 3219–3253.

Digital Library

[77]

Nachiappan Nagappan and Thomas Ball. 2005. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th International Conference on Software Engineering. ACM, 284–292.

Digital Library

[78]

Jaechang Nam, Wei Fu, Sunghun Kim, Tim Menzies, and Lin Tan. 2017. Heterogeneous defect prediction. IEEE Trans. Softw. Eng. 44, 9 (2017), 874–896.

[79]

Jaechang Nam, Sinno Jialin Pan, and Sunghun Kim. 2013. Transfer defect learning. In Proceedings of the 35th International Conference on Software Engineering (ICSE). IEEE, 382–391.

Digital Library

[80]

Chao Ni, Wang-Shu Liu, Xiang Chen, Qing Gu, Dao-Xu Chen, and Qi-Guo Huang. 2017. A cluster based feature selection method for cross-project software defect prediction. J. Comput. Sci. Technol. 32, 6 (2017), 1090–1107.

[81]

Thomas J. Ostrand, Elaine J. Weyuker, and Robert M. Bell. 2005. Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 31, 4 (2005), 340–355.

Digital Library

[82]

Sinno Jialin Pan, Ivor W. Tsang, James T. Kwok, and Qiang Yang. 2010. Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22, 2 (2010), 199–210.

Digital Library

[83]

Sushant Kumar Pandey, Ravi Bhushan Mishra, and Anil Kumar Tripathi. 2020. BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques. Exp. Syst. Applic. 144 (2020), 113085.

Digital Library

[84]

Annibale Panichella, Rocco Oliveto, and Andrea De Lucia. 2014. Cross-project defect prediction models: L’union fait la force. In Proceedings of the IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering–Software Evolution Week (CSMR-WCRE). IEEE, 164–173.

[85]

Chris Parnin and Alessandro Orso. 2011. Are automated debugging techniques actually helping programmers? In Proceedings of the International Symposium on Software Testing and Analysis. ACM, 199–209.

Digital Library

[86]

Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2019. Fine-grained just-in-time defect prediction. J. Syst. Softw. 150 (2019), 22–36.

[87]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg et al. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.

Digital Library

[88]

Fayola Peters, Tim Menzies, Liang Gong, and Hongyu Zhang. 2013. Balancing privacy and utility in cross-company defect prediction. IEEE Trans. Softw. Eng. 39, 8 (2013), 1054–1068.

Digital Library

[89]

Fayola Peters, Tim Menzies, and Lucas Layman. 2015. LACE2: Better privacy-preserving data sharing for cross project defect prediction. In Proceedings of the IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 801–811.

[90]

Fayola Peters, Tim Menzies, and Andrian Marcus. 2013. Better cross company defect prediction. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR). IEEE, 409–418.

Digital Library

[91]

Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In Proceedings of the 35th International Conference on Software Engineering (ICSE). IEEE, 432–441.

[92]

Foyzur Rahman, Sameer Khatri, Earl T. Barr, and Premkumar Devanbu. 2014. Comparing static bug finders and statistical prediction. In Proceedings of the 36th International Conference on Software Engineering (ICSE’14). Association for Computing Machinery, New York, NY, 424–434. DOI:DOI:

Digital Library

[93]

Foyzur Rahman, Daryl Posnett, and Premkumar Devanbu. 2012. Recalling the “imprecision” of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 1–11.

Digital Library

[94]

Foyzur Rahman, Daryl Posnett, Israel Herraiz, and Premkumar Devanbu. 2013. Sample size vs. bias in defect prediction. In Proceedings of the 9th Joint Meeting on Foundations of Software Engineering. ACM, 147–157.

Digital Library

[95]

Santosh Singh Rathore and Sandeep Kumar. 2017. Towards an ensemble based system for predicting the number of software faults. Exp. Syst. Applic. 82 (2017), 357–382.

Digital Library

[96]

Wasiur Rhmann, Babita Pandey, Gufran Ansari, and Devendra Kumar Pandey. 2020. Software fault prediction based on change metrics using hybrid algorithms: An empirical study. J. King Saud Univ.-Comp. Inf. Sci. 32, 4 (2020), 419–424.

[97]

Christoffer Rosen, Ben Grawi, and Emad Shihab. 2015. Commit guru: Analytics and risk prediction of software commits. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, 966–969.

Digital Library

[98]

Duksan Ryu and Jongmoon Baik. 2016. Effective multi-objective naïve Bayes learning for cross-project defect prediction. Appl. Soft Comput. 49 (2016), 1062–1077.

Digital Library

[99]

Duksan Ryu, Okjoo Choi, and Jongmoon Baik. 2016. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir. Softw. Eng. 21, 1 (2016), 43–71.

Digital Library

[100]

Duksan Ryu, Jong-In Jang, and Jongmoon Baik. 2015. A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J. Comput. Sci. Technol. 30, 5 (2015), 969–980.

[101]

Duksan Ryu, Jong-In Jang, and Jongmoon Baik. 2017. A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw. Qual. J. 25, 1 (2017), 235–272.

Digital Library

[102]

Robert Sawyer. 2013. B’s impact on analyses and decision making depends on the development of less complex applications. In Principles and Applications of Business Intelligence Research. IGI Global, 83–95.

[103]

Riccardo Scandariato, James Walden, Aram Hovsepyan, and Wouter Joosen. 2014. Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40, 10 (2014), 993–1006.

[104]

Alon Sela and Hila Chalutz Ben-Gal. 2018. Big data analysis of employee turnover in global media companies, Google, Facebook and others. In Proceedings of the IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE). IEEE, 1–5.

[105]

Emad Shihab, Zhen Ming Jiang, Walid M. Ibrahim, Bram Adams, and Ahmed E. Hassan. 2010. Understanding the impact of code and process metrics on post-release defects: A case study on the Eclipse project. In Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. 1–10.

Digital Library

[106]

N. Shrikanth, S. Majumder, and T. Menzies. 2021. Early life cycle software defect prediction. Why? How? In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos, CA, 448–459. DOI:DOI:

Digital Library

[107]

N. C. Shrikanth and Tim Menzies. 2020. Assessing practitioner beliefs about software defect prediction. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 182–190.

Digital Library

[108]

Michael J. Siers and Md Zahidul Islam. 2015. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf. Syst. 51 (2015), 62–71.

Digital Library

[109]

Jacek Śliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes? In Proceedings of the International Workshop on Mining Software Repositories (MSR’05). ACM, New York, NY, 1–5. DOI:DOI:

Digital Library

[110]

Qinbao Song, Yuchen Guo, and Martin Shepperd. 2018. A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans. Softw. Eng. 45, 12 (2018), 1253–1269.

[111]

Zhongbin Sun, Qinbao Song, and Xiaoyan Zhu. 2012. Using coding-based ensemble learning to improve software defect prediction. IEEE Trans. Syst. Man, Cyber. Part C (Applic. Rev.) 42, 6 (2012), 1806–1817.

Digital Library

[112]

Ming Tan, Lin Tan, Sashank Dara, and Caleb Mayeux. 2015. Online defect prediction for imbalanced data. In Proceedings of the IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 99–108.

[113]

Shiang-Yen Tan and Taizan Chan. 2016. Defining and conceptualizing actionable insight: A conceptual framework for decision-centric analytics. arXiv preprint arXiv:1606.03510 (2016).

[114]

C. Tantithamthavorn, A. E. Hassan, and K. Matsumoto. 2018. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans. Softw. Eng. 46, 11 (2018), 1200–1219.

[115]

Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2016. Automated parameter optimization of classification techniques for defect prediction models. In Proceedings of the 38th International Conference on Software Engineering. 321–332.

Digital Library

[116]

Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2018. The impact of automated parameter optimization on defect prediction models. IEEE Trans. Softw. Eng. 45, 7 (2018), 683–711.

[117]

Haonan Tong, Bin Liu, and Shihai Wang. 2018. Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf. Softw. Technol. 96 (2018), 94–111.

Digital Library

[118]

Burak Turhan, Ayşe Tosun Mısırlı, and Ayşe Bener. 2013. Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf. Softw. Technol. 55, 6 (2013), 1101–1118.

Digital Library

[119]

András Vargha and Harold D. Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Statist. 25, 2 (2000), 101–132.

[120]

Zhiyuan Wan, Xin Xia, Ahmed E. Hassan, David Lo, Jianwei Yin, and Xiaohu Yang. 2018. Perceptions, expectations, and challenges in defect prediction. IEEE Trans. Softw. Eng. 46, 11 (2018), 1241–1266.

[121]

Song Wang, Taiyue Liu, Jaechang Nam, and Lin Tan. 2018. Deep semantic feature learning for software defect prediction. IEEE Trans. Softw. Eng. 46, 12 (2018), 1267–1293.

[122]

Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 297–308.

Digital Library

[123]

Shuo Wang and Xin Yao. 2013. Using class imbalance learning for software defect prediction. IEEE Trans. Reliab. 62, 2 (2013), 434–443.

[124]

Tiejian Wang, Zhiwu Zhang, Xiaoyuan Jing, and Liqiang Zhang. 2016. Multiple kernel ensemble learning for software defect prediction. Autom. Softw. Eng. 23, 4 (2016), 569–590.

Digital Library

[125]

Elaine J. Weyuker, Thomas J. Ostrand, and Robert M. Bell. 2008. Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empir. Softw. Eng. 13, 5 (2008), 539–559.

Digital Library

[126]

Fei Wu, Xiao-Yuan Jing, Ying Sun, Jing Sun, Lin Huang, Fangyi Cui, and Yanfei Sun. 2018. Cross-project and within-project semisupervised software defect prediction: A unified approach. IEEE Trans. Reliab. 67, 2 (2018), 581–597.

[127]

Xin Xia, David Lo, Sinno Jialin Pan, Nachiappan Nagappan, and Xinyu Wang. 2016. Hydra: Massively compositional model for cross-project defect prediction. IEEE Trans. Softw. Eng. 42, 10 (2016), 977–998.

Digital Library

[128]

Xin Xia, Emad Shihab, Yasutaka Kamei, David Lo, and Xinyu Wang. 2016. Predicting crashing releases of mobile applications. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 1–10.

Digital Library

[129]

Zhou Xu, Jin Liu, Xiapu Luo, Zijiang Yang, Yifeng Zhang, Peipei Yuan, Yutian Tang, and Tao Zhang. 2019. Software defect prediction based on kernel PCA and weighted extreme learning machine. Inf. Softw. Technol. 106 (2019), 182–200.

[130]

Meng Yan, Xin Xia, Yuanrui Fan, Ahmed E. Hassan, David Lo, and Shanping Li. 2020. Just-in-time defect identification and localization: A two-phase framework. IEEE Trans. Softw. Eng. 48, 1 (2020), 82–101.

[131]

Meng Yan, Xin Xia, David Lo, Ahmed E. Hassan, and Shanping Li. 2019. Characterizing and identifying reverted commits. Empir. Softw. Eng. 24, 4 (2019), 2171–2208.

Digital Library

[132]

Xinli Yang, David Lo, Xin Xia, and Jianling Sun. 2017. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Inf. Softw. Technol. 87 (2017), 206–220.

Digital Library

[133]

Yibiao Yang, Yuming Zhou, Jinping Liu, Yangyang Zhao, Hongmin Lu, Lei Xu, Baowen Xu, and Hareton Leung. 2016. Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 157–168.

Digital Library

[134]

Jingxiu Yao and Martin Shepperd. 2020. Assessing software defection prediction performance: Why using the Matthews correlation coefficient matters. In Proceedings of the Conference on Evaluation and Assessment in Software Engineering (EASE’20). Association for Computing Machinery, New York, NY, 120–129. DOI:DOI:

Digital Library

[135]

Suraj Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, and Chakkrit Tantithamthavorn. 2019. Mining software defects: Should we consider affected releases? In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 654–665.

Digital Library

[136]

Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 689–699.

Digital Library

[137]

Feng Zhang, Ahmed E. Hassan, Shane McIntosh, and Ying Zou. 2016. The use of summation to aggregate software metrics hinders the performance of defect prediction models. IEEE Trans. Softw. Eng. 43, 5 (2016), 476–491.

Digital Library

[138]

Feng Zhang, Audris Mockus, Iman Keivanloo, and Ying Zou. 2014. Towards building a universal defect prediction model. In Proceedings of the 11th Working Conference on Mining Software Repositories. 182–191.

Digital Library

[139]

Feng Zhang, Audris Mockus, Iman Keivanloo, and Ying Zou. 2016. Towards building a universal defect prediction model with rank transformed predictors. Empir. Softw. Eng. 21, 5 (2016), 2107–2145.

Digital Library

[140]

Feng Zhang, Quan Zheng, Ying Zou, and Ahmed E. Hassan. 2016. Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of the IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 309–320.

Digital Library

[141]

Hongyu Zhang and Rongxin Wu. 2010. Sampling program quality. In Proceedings of the IEEE International Conference on Software Maintenance. IEEE, 1–10.

Digital Library

[142]

Yun Zhang, David Lo, Xin Xia, and Jianling Sun. 2015. An empirical study of classifier combination for cross-project defect prediction. In Proceedings of the IEEE 39th Annual Computer Software and Applications Conference. IEEE, 264–269.

Digital Library

[143]

Tianchi Zhou, Xiaobing Sun, Xin Xia, Bin Li, and Xiang Chen. 2019. Improving defect prediction with deep forest. Inf. Softw. Technol. 114 (2019), 204–216.

Digital Library

[144]

Yuming Zhou, Yibiao Yang, Hongmin Lu, Lin Chen, Yanhui Li, Yangyang Zhao, Junyan Qian, and Baowen Xu. 2018. How far we have progressed in the journey? An examination of cross-project defect prediction. ACM Trans. Softw. Eng. Methodol. 27, 1 (2018), 1–51.

Digital Library

[145]

Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. 2009. Cross-project defect prediction: A large scale experiment on data vs. domain vs. process. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering. 91–100.

Digital Library

Cited By

Menzies TMcIntosh SChoi EHerbold S(2023)Model Review: A PROMISEing OpportunityProceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3617555.3617876(64-68)Online publication date: 8-Dec-2023
https://dl.acm.org/doi/10.1145/3617555.3617876

Index Terms

Assessing the Early Bird Heuristic (for Predicting Project Quality)
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Maintaining software

Recommendations

A Bayesian network approach to assess and predict software quality using activity-based quality models
PROMISE '09: Proceedings of the 5th International Conference on Predictor Models in Software Engineering

Assessing and predicting the complex concept of software quality is still challenging in practice as well as research. Activity-based quality models break down this complex concept into more concrete definitions, more precisely facts about the system, ...
A Bayesian network approach to assess and predict software quality using activity-based quality models

Context: Software quality is a complex concept. Therefore, assessing and predicting it is still challenging in practice as well as in research. Activity-based quality models break down this complex concept into concrete definitions, more precisely facts ...
Quality-Aware Model Ensemble for Brain Tumor Segmentation
Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries
Abstract
Automatic segmentation of brain tumors is still a challenging task. To improve the segmentation performance and better ensemble all the candidate models with different architectures, we proposed a three-stage model with the quality-aware model ... $_{}$

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 32, Issue 5

September 2023

905 pages

ISSN:1049-331X

EISSN:1557-7392

DOI:10.1145/3610417

Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2023

Online AM: 08 February 2023

Accepted: 10 January 2023

Revised: 03 December 2022

Received: 22 July 2022

Published in TOSEM Volume 32, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF-CISE

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
216
Total Downloads

Downloads (Last 12 months)125
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Menzies TMcIntosh SChoi EHerbold S(2023)Model Review: A PROMISEing OpportunityProceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3617555.3617876(64-68)Online publication date: 8-Dec-2023
https://dl.acm.org/doi/10.1145/3617555.3617876

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents