Repair missing data to improve corporate credit risk prediction accuracy with multi-layer perceptron

Yang, Mei; Lim, Ming K.; Qu, Yingchi; Li, Xingzhi; Ni, Du

doi:10.1007/s00500-022-07277-4

Repair missing data to improve corporate credit risk prediction accuracy with multi-layer perceptron

Data analytics and machine learning
Published: 07 July 2022

Volume 26, pages 9167–9178, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

Mei Yang¹,
Ming K. Lim⁴,
Yingchi Qu¹,
Xingzhi Li³ &
…
Du Ni ORCID: orcid.org/0000-0002-7070-5255²

534 Accesses
7 Citations
Explore all metrics

Abstract

Data loss has become an inevitable phenomenon in corporate credit risk (CCR) prediction. To ensure the integrity of data information for subsequent analysis and prediction, it is essential to repair the missing data as accurately as possible. To solve the problem of missing data in credit classification, this study proposes a multi-layer perceptron ensemble (MLP–ESM) model that can perform data interpolation and prediction simultaneously to predict CCR. The model makes full use of non-missing information and interpolates more missing columns with fewer missing values. In this way, not only the data features needed for missing data interpolation are extracted, but also the structural relationship features between the predicted target and the existing data are extracted, which can achieve the effect of simultaneous interpolation and prediction. The results show that the MLP–ESM model can effectively interpolate and predict the missing dataset of CCR. The prediction accuracy is 83.11%, which is better than the traditional machine learning model. This fully shows that the dataset after interpolation can achieve a better prediction effect.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Approach to Corporate Credit Rating Prediction Using Computational Intelligence-Based Methods

A method of credit evaluation modeling based on block-wise missing data

Article 16 February 2021

Credit Risk Assessment with Madaline and Multilayer Perceptrons

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analyzed during the current study are not publicly available due to data being private but are available from the corresponding author on reasonable request.

References

Akin Arikan C, Soysal S (2018) Investigation of reliability coefficients according to missing data imputation methods. Hacettepe Universitesi Egitim Fakultesi Dergisi-Hacettepe Univ J Edu 33(2):316–336
Google Scholar
Belue LM, Bauer KW et al (1997) Selecting optimal experiments for multiple output multilayer perceptrons. Neural Comput 9(1):161–183
Article Google Scholar
Budagaga AR (2020) Determinants of banks’ dividend payment decisions: evidence from MENA countries. Int J Islam Middle East Financ Manag 13(5):847–871
Article Google Scholar
Campanile L, Iacono M et al. (2020) Towards the use of generative adversarial neural networks to attack online resources. Paper presented at the workshops of the International conference on advanced information networking and applications.
Catellier DJ, Hannan PJ et al (2005) Imputation of missing data when measuring physical activity by accelerometry. Med Sci Sports Exerc 37(11):S555–S562
Article Google Scholar
Chang C, Deng Y et al (2020a) Multiple imputation for analysis of incomplete data in distributed health data networks. Nat Commun 11(1):1–11
Article Google Scholar
Chang D, Yang W et al (2020b) Seismic data interpolation using dual-domain conditional generative adversarial networks. IEEE Geosci Remote Sens Lett 18(10):1856–1860
Article Google Scholar
Chiang F, Sitaramachandran S (2016) Unifying data and constraint repairs. Acm J Data Inform Qual 7(3):1
Article Google Scholar
D’Angelo G, Tipaldi M et al (2019) A data-driven approximate dynamic programming approach based on association rule learning: spacecraft autonomy as a case study. Inf Sci 504:501–519
Article MathSciNet Google Scholar
D’Angelo G, Ficco M et al (2021) Association rule-based malware classification using common subsequences of API calls. Appl Soft Comput 105:107234
Article Google Scholar
D’Angelo G, Palmieri F (2021) A stacked autoencoder-based convolutional and recurrent deep neural network for detecting cyberattacks in interconnected power control systems. Int J Intell Syst 36(12):7080–7102
Article Google Scholar
Eduardo S, Nazabal A et al (2020) Robust variational autoencoders for outlier detection and repair of mixed-type data. Paper presented at the proceedings of the twenty third international conference on artificial intelligence and statistics.
Florez-Lopez R (2010) Effects of missing data in credit risk scoring. a comparative analysis of methods to achieve robustness in the absence of sufficient data. J Oper Res Soc 61(3):486–501
Article Google Scholar
Gao DQ, Yang ZP et al (2012) Performance evaluation of multilayer perceptrons for discriminating and quantifying multiple kinds of odors with an electronic nose. Neural Netw 33:204–215
Article Google Scholar
Garcia V, Marques AI et al (2019) Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Inform Fusion 47:88–101
Article Google Scholar
Gu B, Li Z et al (2017) Web-ADARE: a web-aided data repairing system. Neurocomputing 253:201–214
Article Google Scholar
Guo X, Jarrow RA et al (2009) Credit risk models with incomplete information. Math Oper Res 34(2):320–332
Article MathSciNet MATH Google Scholar
Hooke M, Mrozinski J et al (2021) Salvaging data records with missing data: data imputation using the multivariate t distribution. Paper presented at the IEEE Aerospace conference (AeroConf), Electr Network.
Jerez JM, Molina I et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50(2):105–115
Article Google Scholar
Khoygani MRR, Ghasemi R (2016) Neural estimation using a stable discrete-time MLP observer for a class of discrete-time uncertain MIMO nonlinear systems. Nonlinear Dyn 84(4):2517–2533
Article MathSciNet MATH Google Scholar
Lan QJ, Jiang S (2021) A method of credit evaluation modeling based on block-wise missing data. Appl Intell 51(10):6859–6880
Article Google Scholar
Li W, Ding S et al (2020) Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China. World Wide Web-Int Web Inform Syst 23(1):23–45
Article Google Scholar
Mählmann T (2006) Estimation of rating class transition probabilities with incomplete data. J Bank Finance 30(11):3235–3256
Article Google Scholar
Moon T, Hong S et al (2019) Interpolation of greenhouse environment data using multilayer perceptron. Comput Electron Agric 166:105023
Article Google Scholar
Moscato V, Picariello A et al (2021) A benchmark of machine learning approaches for credit score prediction. Expert Syst Appl 165:11368
Article Google Scholar
Nakagawa S, Freckleton RP (2008) Missing inaction: the dangers of ignoring missing data. Trends Ecol Evol 23(11):592–596
Article Google Scholar
Ni C, Jin X (2020) Could L2 lexical attrition be predicted in the dimension of valence, arousal, and dominance? Front Psychol 11:3464
Google Scholar
Nijman SWJ, Groenhof TKJ et al (2021) Real-time imputation of missing predictor values improved the application of prediction models in daily practice. J Clin Epidemiol 134:22–34
Article Google Scholar
Qiu Z, Meng M R et al (2008). Missing value treatment of the data mining based on bayesian principle. Paper presented at the 3rd international conference on computer science and education, Kaifeng, PEOPLES R CHINA.
Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2(3):1–21
MathSciNet Google Scholar
Shema A, & Assoc Comp M (2019) Effective credit scoring using limited mobile phone data. Paper presented at the 10th international conference on information and communication technologies and development (ICTD), Indian Inst Management, Ahmedabad, India.
Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data 6(1):1–48
Article Google Scholar
Silva H, Perera A S et al (2016) Missing data imputation using evolutionary k- nearest neighbor algorithm for gene expression data. Paper presented at the 16th international conference on advances in ICT for emerging regions (ICTer), Negombo, Sri Lanka.
Sivasankar E, Selvi C et al (2016). A study of dimensionality reduction techniques with machine learning methods for credit risk prediction. Paper presented at the 3rd international conference on computational intelligence in data mining (ICCIDM), Bhubaneswar, INDIA.
Soldatyuk N, & Sopko S (2014) Methods of solving missing data issues in credit risk scoring and comparison of its effectiveness. Paper presented at the 32nd international conference on mathematical methods in economics (MME), Olomouc, Czech Republic.
Twala B (2013) Impact of noise on credit risk prediction: does data quality really matter? Intell Data Anal 17(6):1115–1134
Article Google Scholar
Wang G, Ma J (2011) Study of corporate credit risk prediction based on integrating boosting and random subspace. Expert Syst Appl 38(11):13871–13878
Article MathSciNet Google Scholar
Wei J, & Zou K (2019) Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196
Yang S, Berdine G (2017) The receiver operating characteristic (ROC) curve. Southwest Respiratory Critical Care Chronicles 5(19):34–36
Article Google Scholar
Yap BW, Ong SH et al (2011) Using data mining to improve assessment of credit worthiness via credit scoring models. Expert Syst Appl 38(10):13274–13283
Article Google Scholar
Yu BJ, Li CM et al (2022) Forecasting credit ratings of decarbonized firms: Comparative assessment of machine learning models. Technol Forecast Soc Change 174:12125
Google Scholar
Yuan KP, Chi GT et al (2022) A novel two-stage hybrid default prediction model with k-means clustering and support vector domain description. Res Int Business Finance 59:101536
Article Google Scholar
Yue Y M, Tian J W et al (2016). Applications of block chain technology in credit rating. Paper presented at the 13th international conference on industrial management (ICIM 2016), Hiroshima, Japan.
Zhang SL, Wang P et al (2016) Missing value data processing based on statistical correlation. Stat Decis 12:13–16
Google Scholar

Download references

Funding

This work was supported by the graduate research and innovation foundation of Chongqing, China [Grant No. CYS21047] and 2022 Scientific Research Startup Fund of Chongqing Jiaotong University [Grant No. F1210045].

Author information

Authors and Affiliations

School of Economics and Business Administration, Chongqing University, Chongqing, 400030, People’s Republic of China
Mei Yang & Yingchi Qu
School of Management, Nanjing University of Posts and Telecommunications, Jiangsu, 210003, People’s Republic of China
Du Ni
School of Economics and Management, Chongqing Jiaotong University, Chongqing, 400074, People’s Republic of China
Xingzhi Li
Adam Smith Business School, University of Glasgow, Glasgow, G14 8QQ, UK
Ming K. Lim

Authors

Mei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ming K. Lim
View author publications
You can also search for this author in PubMed Google Scholar
Yingchi Qu
View author publications
You can also search for this author in PubMed Google Scholar
Xingzhi Li
View author publications
You can also search for this author in PubMed Google Scholar
Du Ni
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MY: Conceptualization, Investigation, Writing—original draft, Validation, Visualization. MKL: Conceptualization, Project administration, Validation. YQ: Validation, Visualization. XL: Conceptualization, Investigation, Validation. DN: Conceptualization, Methodology, Project administration.

Corresponding author

Correspondence to Du Ni.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

This work does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, M., Lim, M.K., Qu, Y. et al. Repair missing data to improve corporate credit risk prediction accuracy with multi-layer perceptron. Soft Comput 26, 9167–9178 (2022). https://doi.org/10.1007/s00500-022-07277-4

Download citation

Accepted: 12 May 2022
Published: 07 July 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s00500-022-07277-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Repair missing data to improve corporate credit risk prediction accuracy with multi-layer perceptron

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Approach to Corporate Credit Rating Prediction Using Computational Intelligence-Based Methods

A method of credit evaluation modeling based on block-wise missing data

Credit Risk Assessment with Madaline and Multilayer Perceptrons

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Repair missing data to improve corporate credit risk prediction accuracy with multi-layer perceptron

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Approach to Corporate Credit Rating Prediction Using Computational Intelligence-Based Methods

A method of credit evaluation modeling based on block-wise missing data

Credit Risk Assessment with Madaline and Multilayer Perceptrons

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation