Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Assessing the Early Bird Heuristic (for Predicting Project Quality)

Published: 24 July 2023 Publication History

Abstract

Before researchers rush to reason across all available data or try complex methods, perhaps it is prudent to first check for simpler alternatives. Specifically, if the historical data has the most information in some small region, then perhaps a model learned from that region would suffice for the rest of the project.
To support this claim, we offer a case study with 240 projects, where we find that the information in those projects “clumps” towards the earliest parts of the project. A quality prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this “early bird” data, we can build models very quickly and very early in the project life cycle. Moreover, using this early bird method, we have shown that a simple model (with just a few features) generalizes to hundreds of projects.
Based on this experience, we doubt that prior work on generalizing quality models may have needlessly complicated an inherently simple process. Further, prior work that focused on later-life cycle data needs to be revisited, since their conclusions were drawn from relatively uninformative regions.
Replication note: All our data and scripts are available here: https://github.com/snaraya7/early-bird.

References

[1]
Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim. 2001. On the surprising behavior of distance metrics in high dimensional spaces. In Proceedings of the 8th International Conference on Database Theory (ICDT’01). Springer-Verlag, Berlin, 420–434.
[2]
Amritanshu Agrawal, Wei Fu, Di Chen, Xipeng Shen, and Tim Menzies. 2019. How to “DODGE” complex software analytics. IEEE Trans. Softw. Eng. 47, 10 (2019), 2182–2194.
[3]
Amritanshu Agrawal, Wei Fu, and Tim Menzies. 2018. What is wrong with topic modeling? And how to fix it using search-based software engineering. Inf. Softw. Technol. 98 (2018), 74–88.
[4]
Amritanshu Agrawal and Tim Menzies. 2018. Is “better data” better than “better data miners”? In Proceedings of the IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 1050–1061.
[5]
Amritanshu Agrawal, Xueqi Yang, Rishabh Agrawal, Rahul Yedida, Xipeng Shen, and Tim Menzies. 2021. Simpler hyperparameter optimization for software analytics: Why, how, when. IEEE Trans. Softw. Eng. 48, 8 (2021), 2939–2954.
[6]
Fumio Akiyama. 1971. An example of software system debugging. In Information Processing, Proceedings of IFIP Congress 1971, Volume 1 - Foundations and Systems, Ljubljana, Yugoslavia, August 23-28, 1971, Charles V. Freiman, John E. Griffith, and Jack L. Rosenfeld (Eds.). North-Holland, 353–359.
[7]
Sousuke Amasaki. 2020. Cross-version defect prediction: Use historical data, cross-project data, or both? Empir. Softw. Eng. 25, 2 (2020), 1573–1595.
[8]
J. Arokiam and J. Bradbury. 2020. Automatically predicting bug severity early in the development process. In Proceedings of the 42nd International Conference on Software Engineering (ICSE).
[9]
Kwabena Ebo Bennin, Jacky W. Keung, and Akito Monden. 2019. On the relative value of data resampling approaches for software defect prediction. Empir. Softw. Eng. 24, 2 (2019), 602–636.
[10]
James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS’11). Curran Associates Inc., 2546–2554. Retrieved from http://dl.acm.org/citation.cfm?id=2986459.2986743.
[11]
James Bergstra, Daniel Yamins, and David Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the International Conference on Machine Learning. PMLR, 115–123.
[12]
Nicolas Bettenburg, Meiyappan Nagappan, and Ahmed E. Hassan. 2012. Think locally, act globally: Improving defect and effort prediction models. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, 60–69.
[13]
Christian Bird, Tim Menzies, and Thomas Zimmermann. 2015. The Art and Science of Analyzing Software Data (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA.
[14]
Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. 2011. Don’t touch my code! Examining the effects of ownership on software quality. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 4–14.
[15]
Barry Boehm, Ricardo Valerdi, J. Lane, and A. W. Brown. 2005. COCOMO suite methodology and evolution. CrossTalk 18, 4 (2005), 20–25.
[16]
David Bowes, Tracy Hall, and Jean Petrić. 2018. Software defect prediction: Do different classifiers find the same defects? Softw. Qual. J. 26, 2 (2018), 525–552.
[17]
Lionel C. Briand, Walcelio L. Melo, and Jurgen Wust. 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28, 7 (2002), 706–720.
[18]
Gerardo Canfora, Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, Annibale Panichella, and Sebastiano Panichella. 2013. Multi-objective cross-project defect prediction. In Proceedings of the IEEE 6th International Conference on Software Testing, Verification and Validation. IEEE, 252–261.
[19]
Gerardo Canfora, Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, Annibale Panichella, and Sebastiano Panichella. 2015. Defect prediction as a multiobjective optimization problem. Softw. Test. Verif. Reliab. 25, 4 (2015), 426–459.
[20]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 (2002), 321–357.
[21]
Thierry Titcheu Chekam, Mike Papadakis, Tegawendé F. Bissyandé, Yves Le Traon, and Koushik Sen. 2020. Selecting fault revealing mutants. Empir. Softw. Eng. 25, 1 (2020), 434–487.
[22]
Di Chen, Wei Fu, Rahul Krishna, and Tim Menzies. 2018. Applications of psychological science for actionable analytics. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.
[23]
Jinyin Chen, Yitao Yang, Keke Hu, Qi Xuan, Yi Liu, and Chao Yang. 2019. Multiview transfer learning for software defect prediction. IEEE Access 7 (2019), 8901–8916.
[24]
Lin Chen, Bin Fang, Zhaowei Shang, and Yuanyan Tang. 2015. Negative samples reduction in cross-company software defects prediction. Inf. Softw. Technol. 62 (2015), 67–77.
[25]
Xiang Chen, Dun Zhang, Yingquan Zhao, Zhanqi Cui, and Chao Ni. 2019. Software defect number prediction: Unsupervised vs supervised methods. Inf. Softw. Technol. 106 (2019), 161–181.
[26]
Xiang Chen, Yingquan Zhao, Qiuping Wang, and Zhidan Yuan. 2018. MULTI: Multi-objective effort-aware just-in-time software defect prediction. Inf. Softw. Technol. 93 (2018), 1–13.
[27]
Hoa Khanh Dam, Trang Pham, Shien Wee Ng, Truyen Tran, John Grundy, Aditya Ghose, Taeksu Kim, and Chul-Joo Kim. 2018. A deep tree-based model for software defect prediction. arXiv preprint arXiv:1802.00921 (2018).
[28]
Marco D’Ambros, Michele Lanza, and Romain Robbes. 2012. Evaluating defect prediction approaches: A benchmark and an extensive comparison. Empir. Softw. Eng. 17, 4-5 (2012), 531–577.
[29]
Norman Fenton, Martin Neil, William Marsh, Peter Hearty, Łukasz Radliński, and Paul Krause. 2008. On the effectiveness of early life cycle defect prediction with Bayesian nets. Empir. Softw. Eng. 13, 5 (2008), 499.
[30]
Wei Fu, Tim Menzies, and Xipeng Shen. 2016. Tuning for software analytics: Is it really necessary? Inf. Softw. Technol. 76 (2016), 135–146.
[31]
Takafumi Fukushima, Yasutaka Kamei, Shane McIntosh, Kazuhiro Yamashita, and Naoyasu Ubayashi. 2014. An empirical study of just-in-time defect prediction using cross-project models. In Proceedings of the 11th Working Conference on Mining Software Repositories. 172–181.
[32]
Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In Proceedings of the IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 789–800.
[33]
Brent Hailpern and Padmanabhan Santhanam. 2002. Software debugging, testing, and verification. IBM Syst. J. 41, 1 (2002), 4–12.
[34]
Mark A. Hall and Geoffrey Holmes. 2003. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Eng. 15, 6 (2003), 1437–1447.
[35]
A. Hassan. 2017. Remarks made during a presentation to the UCL Crest Open Workshop. (March2017).
[36]
Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 9 (2009), 1263–1284.
[37]
Peng He, Bing Li, Xiao Liu, Jun Chen, and Yutao Ma. 2015. An empirical study on software defect prediction with a simplified metric set. Inf. Softw. Technol. 59 (2015), 170–190.
[38]
Zhimin He, Fayola Peters, Tim Menzies, and Ye Yang. 2013. Learning from open-source projects: An empirical study on defect prediction. In Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, 45–54.
[39]
Zhimin He, Fengdi Shu, Ye Yang, Mingshu Li, and Qing Wang. 2012. An investigation on the feasibility of cross-project defect prediction. Autom. Softw. Eng. 19, 2 (2012), 167–199.
[40]
Steffen Herbold. 2013. Training data selection for cross-project defect prediction. In Proceedings of the 9th International Conference on Predictive Models in Software Engineering. 1–10.
[41]
T. Hoang, H. Khanh Dam, Y. Kamei, D. Lo, and N. Ubayashi. 2019. DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction. In Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 34–45.
[42]
Seyedrebvar Hosseini, Burak Turhan, and Mika Mäntylä. 2018. A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf. Softw. Technol. 95 (2018), 296–312.
[43]
Qiao Huang, Xin Xia, and David Lo. 2019. Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction. Empir. Softw. Eng. 24, 5 (2019), 2823–2862.
[44]
Shamsul Huda, Kevin Liu, Mohamed Abdelrazek, Amani Ibrahim, Sultan Alyahya, Hmood Al-Dossari, and Shafiq Ahmad. 2018. An ensemble oversampling model for class imbalance problem in software defect prediction. IEEE Access 6 (2018), 24184–24195.
[45]
Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 279–289.
[46]
Xiaoyuan Jing, Fei Wu, Xiwei Dong, Fumin Qi, and Baowen Xu. 2015. Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. 496–507.
[47]
Xiao-Yuan Jing, Fei Wu, Xiwei Dong, and Baowen Xu. 2016. An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans. Softw. Eng. 43, 4 (2016), 321–339.
[48]
Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E. Hassan. 2016. Studying just-in-time defect prediction using cross-project models. Empir. Softw. Eng. 21, 5 (2016), 2072–2106.
[49]
Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. 2012. A large-scale empirical study of just-in-time quality assurance. IEEE Trans. Softw. Eng. 39, 6 (2012), 757–773.
[50]
Miryung Kim, Dongxiang Cai, and Sunghun Kim. 2011. An empirical investigation into the role of API-level refactorings during software evolution. In Proceedings of the 33rd International Conference on Software Engineering. ACM, 151–160.
[51]
Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The emerging role of data scientists on software development teams. In Proceedings of the 38th International Conference on Software Engineering (ICSE’16). ACM, New York, NY, 96–107. DOI:DOI:
[52]
Ekrem Kocaguneli, Tim Menzies, and Emilia Mendes. 2015. Transfer learning in effort estimation. Empir. Softw. Eng. 20, 3 (June2015), 813–843. DOI:DOI:
[53]
Masanari Kondo, Cor-Paul Bezemer, Yasutaka Kamei, Ahmed E. Hassan, and Osamu Mizuno. 2019. The impact of feature reduction techniques on defect prediction models. Empir. Softw. Eng. 24, 4 (2019), 1925–1963.
[54]
Masanari Kondo, Daniel M. German, Osamu Mizuno, and Eun-Hye Choi. 2020. The impact of context metrics on just-in-time defect prediction. Empir. Softw. Eng. 25, 1 (2020), 890–939.
[55]
A. Güneş Koru, Khaled El Emam, Dongsong Zhang, Hongfang Liu, and Divya Mathew. 2008. Theory of relative defect proneness. Empir. Softw. Eng. 13, 5 (2008), 473–498.
[56]
A. Güneş Koru, Dongsong Zhang, Khaled El Emam, and Hongfang Liu. 2008. An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans. Softw. Eng. 35, 2 (2008), 293–304.
[57]
Rahul Krishna and Tim Menzies. 2018. Bellwethers: A baseline method for transfer learning. IEEE Trans. Softw. Eng. 45, 11 (2018), 1081–1105.
[58]
Rahul Krishna, Tim Menzies, and Wei Fu. 2016. Too much automation? The bellwether effect and its implications for transfer learning. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 122–131.
[59]
An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2017. Bug localization with combination of deep learning and information retrieval. In Proceedings of the IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE, 218–229.
[60]
Issam H. Laradji, Mohammad Alshayeb, and Lahouari Ghouti. 2015. Software defect prediction using ensemble learning on selected features. Inf. Softw. Technol. 58 (2015), 388–402.
[61]
Elizaveta Levina and Peter J. Bickel. 2005. Maximum likelihood estimation of intrinsic dimension. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 777–784.
[62]
Jian Li, Pinjia He, Jieming Zhu, and Michael R. Lyu. 2017. Software defect prediction via convolutional neural network. In Proceedings of the IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE, 318–328.
[63]
Ming Li, Hongyu Zhang, Rongxin Wu, and Zhi-Hua Zhou. 2012. Sample-based software defect prediction with active and semi-supervised learning. Autom. Softw. Eng. 19, 2 (2012), 201–230.
[64]
Zhiqiang Li, Xiao-Yuan Jing, Fei Wu, Xiaoke Zhu, Baowen Xu, and Shi Ying. 2018. Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction. Autom. Softw. Eng. 25, 2 (2018), 201–245.
[65]
Chao Liu, Dan Yang, Xin Xia, Meng Yan, and Xiaohong Zhang. 2019. A two-phase transfer learning model for cross-project defect prediction. Inf. Softw. Technol. 107 (2019), 125–136.
[66]
Ying Ma, Guangchun Luo, Xue Zeng, and Aiguo Chen. 2012. Transfer learning for cross-company software defect prediction. Inf. Softw. Technol. 54, 3 (2012), 248–256.
[67]
George Mathew, Amritanshu Agrawal, and Tim Menzies. 2018. Finding trends in software research. IEEE Trans. Softw. Eng. (2018), 1–1.
[68]
Shane McIntosh and Yasutaka Kamei. 2017. Are fix-inducing changes a moving target? A longitudinal case study of just-in-time defect prediction. IEEE Trans. Softw. Eng. 44, 5 (2017), 412–428.
[69]
Tim Menzies, Andrew Butcher, David Cok, Andrian Marcus, Lucas Layman, Forrest Shull, Burak Turhan, and Thomas Zimmermann. 2012. Local versus global lessons for defect prediction and effort estimation. IEEE Trans. Softw. Eng. 39, 6 (2012), 822–834.
[70]
Tim Menzies, Andrew Butcher, Andrian Marcus, Thomas Zimmermann, and David Cok. 2011. Local vs. global models for effort estimation and defect prediction. In Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE’11). IEEE, 343–351.
[71]
Tim Menzies, Jeremy Greenwald, and Art Frank. 2006. Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33, 1 (2006), 2–13.
[72]
Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayşe Bener. 2010. Defect prediction from static code features: Current results, limitations, new approaches. Autom. Softw. Eng. 17, 4 (2010), 375–407.
[73]
Tim Menzies, Burak Turhan, Ayse Bener, Gregory Gay, Bojan Cukic, and Yue Jiang. 2008. Implications of ceiling effects in defect predictors. In Proceedings of the 4th International Workshop on Predictor Models in Software Engineering. 47–54.
[74]
Ayse Tosun Misirli, Ayse Bener, and Resat Kale. 2011. AI-based software defect predictors: Applications and benefits in a case study. AI Mag. 32, 2 (2011), 57–68.
[75]
N. Mittas and L. Angelis. 2013. Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans. Softw. Eng. 39, 4 (Apr.2013), 537–551. DOI:DOI:
[76]
Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating github for engineered software projects. Empir. Softw. Eng. 22, 6 (2017), 3219–3253.
[77]
Nachiappan Nagappan and Thomas Ball. 2005. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th International Conference on Software Engineering. ACM, 284–292.
[78]
Jaechang Nam, Wei Fu, Sunghun Kim, Tim Menzies, and Lin Tan. 2017. Heterogeneous defect prediction. IEEE Trans. Softw. Eng. 44, 9 (2017), 874–896.
[79]
Jaechang Nam, Sinno Jialin Pan, and Sunghun Kim. 2013. Transfer defect learning. In Proceedings of the 35th International Conference on Software Engineering (ICSE). IEEE, 382–391.
[80]
Chao Ni, Wang-Shu Liu, Xiang Chen, Qing Gu, Dao-Xu Chen, and Qi-Guo Huang. 2017. A cluster based feature selection method for cross-project software defect prediction. J. Comput. Sci. Technol. 32, 6 (2017), 1090–1107.
[81]
Thomas J. Ostrand, Elaine J. Weyuker, and Robert M. Bell. 2005. Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 31, 4 (2005), 340–355.
[82]
Sinno Jialin Pan, Ivor W. Tsang, James T. Kwok, and Qiang Yang. 2010. Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22, 2 (2010), 199–210.
[83]
Sushant Kumar Pandey, Ravi Bhushan Mishra, and Anil Kumar Tripathi. 2020. BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques. Exp. Syst. Applic. 144 (2020), 113085.
[84]
Annibale Panichella, Rocco Oliveto, and Andrea De Lucia. 2014. Cross-project defect prediction models: L’union fait la force. In Proceedings of the IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering–Software Evolution Week (CSMR-WCRE). IEEE, 164–173.
[85]
Chris Parnin and Alessandro Orso. 2011. Are automated debugging techniques actually helping programmers? In Proceedings of the International Symposium on Software Testing and Analysis. ACM, 199–209.
[86]
Luca Pascarella, Fabio Palomba, and Alberto Bacchelli. 2019. Fine-grained just-in-time defect prediction. J. Syst. Softw. 150 (2019), 22–36.
[87]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg et al. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.
[88]
Fayola Peters, Tim Menzies, Liang Gong, and Hongyu Zhang. 2013. Balancing privacy and utility in cross-company defect prediction. IEEE Trans. Softw. Eng. 39, 8 (2013), 1054–1068.
[89]
Fayola Peters, Tim Menzies, and Lucas Layman. 2015. LACE2: Better privacy-preserving data sharing for cross project defect prediction. In Proceedings of the IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 801–811.
[90]
Fayola Peters, Tim Menzies, and Andrian Marcus. 2013. Better cross company defect prediction. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR). IEEE, 409–418.
[91]
Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In Proceedings of the 35th International Conference on Software Engineering (ICSE). IEEE, 432–441.
[92]
Foyzur Rahman, Sameer Khatri, Earl T. Barr, and Premkumar Devanbu. 2014. Comparing static bug finders and statistical prediction. In Proceedings of the 36th International Conference on Software Engineering (ICSE’14). Association for Computing Machinery, New York, NY, 424–434. DOI:DOI:
[93]
Foyzur Rahman, Daryl Posnett, and Premkumar Devanbu. 2012. Recalling the “imprecision” of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 1–11.
[94]
Foyzur Rahman, Daryl Posnett, Israel Herraiz, and Premkumar Devanbu. 2013. Sample size vs. bias in defect prediction. In Proceedings of the 9th Joint Meeting on Foundations of Software Engineering. ACM, 147–157.
[95]
Santosh Singh Rathore and Sandeep Kumar. 2017. Towards an ensemble based system for predicting the number of software faults. Exp. Syst. Applic. 82 (2017), 357–382.
[96]
Wasiur Rhmann, Babita Pandey, Gufran Ansari, and Devendra Kumar Pandey. 2020. Software fault prediction based on change metrics using hybrid algorithms: An empirical study. J. King Saud Univ.-Comp. Inf. Sci. 32, 4 (2020), 419–424.
[97]
Christoffer Rosen, Ben Grawi, and Emad Shihab. 2015. Commit guru: Analytics and risk prediction of software commits. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, 966–969.
[98]
Duksan Ryu and Jongmoon Baik. 2016. Effective multi-objective naïve Bayes learning for cross-project defect prediction. Appl. Soft Comput. 49 (2016), 1062–1077.
[99]
Duksan Ryu, Okjoo Choi, and Jongmoon Baik. 2016. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir. Softw. Eng. 21, 1 (2016), 43–71.
[100]
Duksan Ryu, Jong-In Jang, and Jongmoon Baik. 2015. A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J. Comput. Sci. Technol. 30, 5 (2015), 969–980.
[101]
Duksan Ryu, Jong-In Jang, and Jongmoon Baik. 2017. A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw. Qual. J. 25, 1 (2017), 235–272.
[102]
Robert Sawyer. 2013. B’s impact on analyses and decision making depends on the development of less complex applications. In Principles and Applications of Business Intelligence Research. IGI Global, 83–95.
[103]
Riccardo Scandariato, James Walden, Aram Hovsepyan, and Wouter Joosen. 2014. Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40, 10 (2014), 993–1006.
[104]
Alon Sela and Hila Chalutz Ben-Gal. 2018. Big data analysis of employee turnover in global media companies, Google, Facebook and others. In Proceedings of the IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE). IEEE, 1–5.
[105]
Emad Shihab, Zhen Ming Jiang, Walid M. Ibrahim, Bram Adams, and Ahmed E. Hassan. 2010. Understanding the impact of code and process metrics on post-release defects: A case study on the Eclipse project. In Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. 1–10.
[106]
N. Shrikanth, S. Majumder, and T. Menzies. 2021. Early life cycle software defect prediction. Why? How? In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos, CA, 448–459. DOI:DOI:
[107]
N. C. Shrikanth and Tim Menzies. 2020. Assessing practitioner beliefs about software defect prediction. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 182–190.
[108]
Michael J. Siers and Md Zahidul Islam. 2015. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf. Syst. 51 (2015), 62–71.
[109]
Jacek Śliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes? In Proceedings of the International Workshop on Mining Software Repositories (MSR’05). ACM, New York, NY, 1–5. DOI:DOI:
[110]
Qinbao Song, Yuchen Guo, and Martin Shepperd. 2018. A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans. Softw. Eng. 45, 12 (2018), 1253–1269.
[111]
Zhongbin Sun, Qinbao Song, and Xiaoyan Zhu. 2012. Using coding-based ensemble learning to improve software defect prediction. IEEE Trans. Syst. Man, Cyber. Part C (Applic. Rev.) 42, 6 (2012), 1806–1817.
[112]
Ming Tan, Lin Tan, Sashank Dara, and Caleb Mayeux. 2015. Online defect prediction for imbalanced data. In Proceedings of the IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 99–108.
[113]
Shiang-Yen Tan and Taizan Chan. 2016. Defining and conceptualizing actionable insight: A conceptual framework for decision-centric analytics. arXiv preprint arXiv:1606.03510 (2016).
[114]
C. Tantithamthavorn, A. E. Hassan, and K. Matsumoto. 2018. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans. Softw. Eng. 46, 11 (2018), 1200–1219.
[115]
Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2016. Automated parameter optimization of classification techniques for defect prediction models. In Proceedings of the 38th International Conference on Software Engineering. 321–332.
[116]
Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2018. The impact of automated parameter optimization on defect prediction models. IEEE Trans. Softw. Eng. 45, 7 (2018), 683–711.
[117]
Haonan Tong, Bin Liu, and Shihai Wang. 2018. Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf. Softw. Technol. 96 (2018), 94–111.
[118]
Burak Turhan, Ayşe Tosun Mısırlı, and Ayşe Bener. 2013. Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf. Softw. Technol. 55, 6 (2013), 1101–1118.
[119]
András Vargha and Harold D. Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Statist. 25, 2 (2000), 101–132.
[120]
Zhiyuan Wan, Xin Xia, Ahmed E. Hassan, David Lo, Jianwei Yin, and Xiaohu Yang. 2018. Perceptions, expectations, and challenges in defect prediction. IEEE Trans. Softw. Eng. 46, 11 (2018), 1241–1266.
[121]
Song Wang, Taiyue Liu, Jaechang Nam, and Lin Tan. 2018. Deep semantic feature learning for software defect prediction. IEEE Trans. Softw. Eng. 46, 12 (2018), 1267–1293.
[122]
Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 297–308.
[123]
Shuo Wang and Xin Yao. 2013. Using class imbalance learning for software defect prediction. IEEE Trans. Reliab. 62, 2 (2013), 434–443.
[124]
Tiejian Wang, Zhiwu Zhang, Xiaoyuan Jing, and Liqiang Zhang. 2016. Multiple kernel ensemble learning for software defect prediction. Autom. Softw. Eng. 23, 4 (2016), 569–590.
[125]
Elaine J. Weyuker, Thomas J. Ostrand, and Robert M. Bell. 2008. Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empir. Softw. Eng. 13, 5 (2008), 539–559.
[126]
Fei Wu, Xiao-Yuan Jing, Ying Sun, Jing Sun, Lin Huang, Fangyi Cui, and Yanfei Sun. 2018. Cross-project and within-project semisupervised software defect prediction: A unified approach. IEEE Trans. Reliab. 67, 2 (2018), 581–597.
[127]
Xin Xia, David Lo, Sinno Jialin Pan, Nachiappan Nagappan, and Xinyu Wang. 2016. Hydra: Massively compositional model for cross-project defect prediction. IEEE Trans. Softw. Eng. 42, 10 (2016), 977–998.
[128]
Xin Xia, Emad Shihab, Yasutaka Kamei, David Lo, and Xinyu Wang. 2016. Predicting crashing releases of mobile applications. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 1–10.
[129]
Zhou Xu, Jin Liu, Xiapu Luo, Zijiang Yang, Yifeng Zhang, Peipei Yuan, Yutian Tang, and Tao Zhang. 2019. Software defect prediction based on kernel PCA and weighted extreme learning machine. Inf. Softw. Technol. 106 (2019), 182–200.
[130]
Meng Yan, Xin Xia, Yuanrui Fan, Ahmed E. Hassan, David Lo, and Shanping Li. 2020. Just-in-time defect identification and localization: A two-phase framework. IEEE Trans. Softw. Eng. 48, 1 (2020), 82–101.
[131]
Meng Yan, Xin Xia, David Lo, Ahmed E. Hassan, and Shanping Li. 2019. Characterizing and identifying reverted commits. Empir. Softw. Eng. 24, 4 (2019), 2171–2208.
[132]
Xinli Yang, David Lo, Xin Xia, and Jianling Sun. 2017. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Inf. Softw. Technol. 87 (2017), 206–220.
[133]
Yibiao Yang, Yuming Zhou, Jinping Liu, Yangyang Zhao, Hongmin Lu, Lei Xu, Baowen Xu, and Hareton Leung. 2016. Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 157–168.
[134]
Jingxiu Yao and Martin Shepperd. 2020. Assessing software defection prediction performance: Why using the Matthews correlation coefficient matters. In Proceedings of the Conference on Evaluation and Assessment in Software Engineering (EASE’20). Association for Computing Machinery, New York, NY, 120–129. DOI:DOI:
[135]
Suraj Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, and Chakkrit Tantithamthavorn. 2019. Mining software defects: Should we consider affected releases? In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 654–665.
[136]
Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 689–699.
[137]
Feng Zhang, Ahmed E. Hassan, Shane McIntosh, and Ying Zou. 2016. The use of summation to aggregate software metrics hinders the performance of defect prediction models. IEEE Trans. Softw. Eng. 43, 5 (2016), 476–491.
[138]
Feng Zhang, Audris Mockus, Iman Keivanloo, and Ying Zou. 2014. Towards building a universal defect prediction model. In Proceedings of the 11th Working Conference on Mining Software Repositories. 182–191.
[139]
Feng Zhang, Audris Mockus, Iman Keivanloo, and Ying Zou. 2016. Towards building a universal defect prediction model with rank transformed predictors. Empir. Softw. Eng. 21, 5 (2016), 2107–2145.
[140]
Feng Zhang, Quan Zheng, Ying Zou, and Ahmed E. Hassan. 2016. Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of the IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 309–320.
[141]
Hongyu Zhang and Rongxin Wu. 2010. Sampling program quality. In Proceedings of the IEEE International Conference on Software Maintenance. IEEE, 1–10.
[142]
Yun Zhang, David Lo, Xin Xia, and Jianling Sun. 2015. An empirical study of classifier combination for cross-project defect prediction. In Proceedings of the IEEE 39th Annual Computer Software and Applications Conference. IEEE, 264–269.
[143]
Tianchi Zhou, Xiaobing Sun, Xin Xia, Bin Li, and Xiang Chen. 2019. Improving defect prediction with deep forest. Inf. Softw. Technol. 114 (2019), 204–216.
[144]
Yuming Zhou, Yibiao Yang, Hongmin Lu, Lin Chen, Yanhui Li, Yangyang Zhao, Junyan Qian, and Baowen Xu. 2018. How far we have progressed in the journey? An examination of cross-project defect prediction. ACM Trans. Softw. Eng. Methodol. 27, 1 (2018), 1–51.
[145]
Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. 2009. Cross-project defect prediction: A large scale experiment on data vs. domain vs. process. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering. 91–100.

Cited By

View all
  • (2023)Model Review: A PROMISEing OpportunityProceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3617555.3617876(64-68)Online publication date: 8-Dec-2023

Index Terms

  1. Assessing the Early Bird Heuristic (for Predicting Project Quality)

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 5
    September 2023
    905 pages
    ISSN:1049-331X
    EISSN:1557-7392
    DOI:10.1145/3610417
    • Editor:
    • Mauro Pezzè
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 2023
    Online AM: 08 February 2023
    Accepted: 10 January 2023
    Revised: 03 December 2022
    Received: 22 July 2022
    Published in TOSEM Volume 32, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Quality prediction
    2. defects
    3. early
    4. data-lite

    Qualifiers

    • Research-article

    Funding Sources

    • NSF-CISE

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)125
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Model Review: A PROMISEing OpportunityProceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3617555.3617876(64-68)Online publication date: 8-Dec-2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media