Abstract
Firm failure rate in the software industry is significantly higher than other industries. Due to the wide use of software products and services, failure in the software industry has implications on the industry itself as well as the economy at the local, national and global levels. This study compares the classification performance of thirteen approaches in terms of predicting firm failure in the US software industry. Seven measures are used to evaluate the classifiers’ performance. We use synthetic minority oversampling technique (SMOTE), SMOTEBoost and SMOTEBagging to account for the data imbalance issue. In order to give managers enough time to develop strategies and take the necessary actions to reduce the likelihood of failing, we use 20 financial indicators collected 4 years before the last available date about each firm. Our findings show that embedding SMOTE into boosting and bagging algorithms is better than preprocessing data using SMOTE before learning the classifier. According to the sensitivity analysis, research and development expense is the most significant predictor of firm failure followed by net sales and total revenue. Our results can be used by managers as a decision support tool to identify high-risk firms at an early stage and take the necessary actions to prevent a firm from failing. The early prediction of firm failure will allow software firms to modularize their products or services into specific “features” and offer them as “digital services” using new business models or combine these services with partner firms’ services to create new products and address evolving customer expectations. Moreover, the early prediction of firm failure in the software industry calls on firms, both new and those in the growth stage, to componentize their design for adaptability and to build agility in the way firms use their resource mix to address both market gaps as well as operational gaps.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Almamy J, Aston J, Ngwa LN (2016) An evaluation of Altman’s Z-score using cash flow ratio to predict corporate failure amid the recent financial crisis: evidence from the UK. J Corp Finance 36:278–285
Altman EI (1968) Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J Finance 23(4):589–609
Balcaen S, Ooghe H (2006) 35 years of studies on business failure: an overview of the classic statistical methodologies and their related problems. Br Acc Rev 38(1):63–93
Bayus BL, Agarwal R (2007) The role of pre-entry experience, entry timing, and product technology strategies in explaining firm survival. Manag Sci 53(12):1887–1902
Bellovary JL, Giacomino DE, Akers MD (2007) A review of bankruptcy prediction studies: 1930 to present. J Financ Educ 33:1–42
Bokhari Z (2007) Industry surveys: computer software. Standard and Poor’s Industry Surveys
Bossert O, Laartz J, Ramsoy TJ (2014) Running your company at two speeds. McKinsey & Company, New York
Bouckaert RR (2004) Bayesian network classifiers in WEKA. Department of Computer Science, University of Waikato, Hamilton
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, Berlin, pp 107–119
Chen N, Ribeiro B, Vieira AS, Duarte J, Neves CJ (2011) A genetic algorithm-based approach to cost-sensitive bankruptcy prediction. Expert Syst Appl 38(10):12939–12945
Cox DR (1972) Regression models and life tables. J R Stat Soc Ser B (Methodol) 34(2):187–202
Forrest C (2017) Software industry boosts US GDP by $1.14 trillion, grows economy in all 50 states. Retrieved from https://www.techrepublic.com/article/software-industry-boosts-us-gdp-by-1-14-trillion-grows-economy-in-all-50-states/
Frank E, Hall MA, Witten IH (2016) The WEKA workbench. Online appendix for “Data mining: Practical machine learning tools and techniques”, 4th ed. Morgan Kaufmann, Los Altos
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern C Appl Rev 42(4):463–484
Garcia MNM, Robledo JG, González FM, Hernández FS, Barba MS (2014) Machine learning methods for mortality prediction of polytraumatized patients in intensive care units–dealing with imbalanced and high-dimensional data. In: International conference on intelligent data engineering and automated learning. Springer, Cham, pp 309–317
Gashler M, Giraud-Carrier C, Martinez T (2008) Decision tree ensemble: small heterogeneous is better than large homogeneous. In: Seventh international conference on machine learning and applications, 2008 (ICMLA’08). IEEE, pp 900–905
Geng R, Bose I, Chen X (2015) Prediction of financial distress: an empirical study of listed Chinese companies using data mining. Eur J Oper Res 241(1):236–247
Gepp A, Kumar K, Bhattacharya S (2010) Business failure prediction using decision trees. J Forecast 29(6):536–555
Giarratana MS, Fosfuri A (2007) Product strategies and survival in Schumpeterian environments: evidence from the US security software industry. Organ Stud 28(6):909–929
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Horta RM, De Lima BP, Borges CCH (2008) A semi-deterministic ensemble strategy for imbalanced datasets (SDEID) applied to bankruptcy prediction. WIT Trans Inf Commun Technol 40:205–213
Kantardzic M (2011) Data mining: concepts, models, methods, and algorithms. Wiley, New York
Keil M, Carmel E (1995) Customer–developer links in software development. Commun ACM 38(5):33–44
Kim MJ, Kang DK (2010) Ensemble with neural networks for bankruptcy prediction. Expert Syst Appl 37(4):3373–3379
Kirkos E (2015) Assessing methodologies for intelligent bankruptcy prediction. Artif Intell Rev 43:1–41
Kleinbaum D, Kupper L, Nizam A, Rosenberg E (2013) Applied regression analysis and other multivariable methods. Nelson Education, Scarborough
Kumar PR, Ravi V (2007) Bankruptcy prediction in banks and firms via statistical and intelligent techniques—a review. Eur J Oper Res 180(1):1–28
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley, New York
Lessmann S, Baesens B, Seow HV, Thomas LC (2015) Benchmarking state-of-the- art classification algorithms for credit scoring: an update of research. Eur J Oper Res 247(1):124–136
Li H, Sun J (2010) Business failure prediction using hybrid2 case-based reasoning (H2CBR). Comput Oper Res 37(1):137–151
Li S, Shang J, Slaughter SA (2010) Why do software firms fail? Capabilities, competitive actions, and firm survival in the software industry from 1995 to 2007. Inf Syst Res 21(3):631–654
Lopez V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Lusch RF, Nambisan S (2015) Service innovation: a service-dominant logic perspective. MIS Q 39(1):155–175
Menor LJ, Kristal MM, Rosenzweigh ED (2007) Examining the influence of operational intellectual capital on capabilities and performance. Manuf Serv Oper Manag 9(4):559–578
Neves JC, Vieira A (2006) Improving bankruptcy prediction with hidden layer learning vector quantization. Eur Acc Rev 15(2):253–271
Ohlson JA (1980) Financial ratios and the probabilistic prediction of bankruptcy. J Acc Res 18:109–131
Oztekin A, Delen D, Turkyilmaz A, Zaim S (2013) A machine learning-based usability evaluation method for eLearning systems. Decis Support Syst 56:63–73
Oztekin A, Kizilaslan R, Freund S, Iseri A (2016) A data analytic approach to forecasting daily stock returns in an emerging market. Eur J Oper Res 253(3):697–710
Pal R, Kupka K, Aneja AP, Militky J (2016) Business health characterization: a hybrid regression and support vector machine analysis. Expert Syst Appl 49:48–59
R C Team (2018) R: a language and environment for statistical computing
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39
Roumani YF, Roumani Y, Nwankpa JK, Tanniru M (2018) Classifying readmissions to a cardiac intensive care unit. Ann Oper Res 262(1–2):429–451
Saltelli A (2002) Making best use of model evaluations to compute sensitivity indices. Comput Phys Commun 145(2):280–297
Schmalensee R (2000) Antitrust issues in Schumpeterian industries. Am Econ Rev 90(2):192–196
Sesmero MP, Ledezma AI, Sanchis A (2015) Generating ensembles of heterogeneous classifiers using stacked generalization. Wiley Interdiscip Rev Data Min Knowl Discov 5(1):21–34
Sevim C, Oztekin A, Bali O, Gumus S, Guresen E (2014) Developing an early warning system to predict currency crises. Eur J Oper Res 237(3):1095–1104
Sun L, Shenoy PP (2007) Using Bayesian networks for bankruptcy prediction: some methodological issues. Eur J Oper Res 180(2):738–753
Sun J, Li H, Huang QH, He KY (2014a) Predicting financial distress and corporate failure: a review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowl Based Syst 57:41–56
Sun J, Shang Z, Li H (2014b) Imbalance-oriented SVM methods for financial distress prediction: a comparative study among the new SB-SVM-ensemble method and traditional methods. J Oper Res Soc 65(12):1905–1919
Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with DTE–SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf Sci 425:76–91
Tsai CF, Wu JW (2008) Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst Appl 34(4):2639–2649
Vittinghoff E, Glidden DV, Shiboski SC, McCulloch CE (2011) Regression methods in biostatistics: linear, logistic, survival, and repeated measures models. Springer, Berlin
Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE symposium on computational intelligence and data mining, 2009 (CIDM’09). IEEE, pp 324–331
West D, Dellana S, Qian J (2005) Neural network ensemble strategies for financial decision applications. Comput Oper Res 32(10):2543–2559
Wilson RL, Sharda R (1994) Bankruptcy prediction using neural networks. Decis Support Syst 11(5):545–557
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Xiao Z, Yang X, Pang Y, Dang X (2012) The prediction for listed companies’ financial distress by using multiple prediction methods with rough set and Dempster–Shafer evidence theory. Knowl Based Syst 26:196–206
Zhang G, Hu MY, Patuwo BE, Indro DC (1999) Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis. Eur J Oper Res 116(1):16–32
Zhou L (2013) Performance of corporate bankruptcy prediction models on imbalanced dataset: the effect of sampling methods. Knowl Based Syst 41:16–25
Zieba M, Tomczak SK, Tomczak JM (2016) Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst Appl 58:93–101
Acknowledgements
This research was partially supported by a 2017 Oakland University School of Business Administration Spring/Summer Research Fellowship.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Roumani, Y.F., Nwankpa, J.K. & Tanniru, M. Predicting firm failure in the software industry. Artif Intell Rev 53, 4161–4182 (2020). https://doi.org/10.1007/s10462-019-09789-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-019-09789-2