Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Predicting Credit Risk of Financial Firms in India Using AI-based ML Approaches A Study of Nifty 50 Firms

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Predicting credit risk of financial firms in India using AI-based ML

approaches: A study of Nifty 50 firms

Abstract
Artificial Intelligence (AI) in the digitalized world is identified as the most technological
indicator of a firm’s potential. Financial institutions like banks, insurance firms and investment
companies often witness credit risk problems due to borrowers’ failure honour their financial
commitments, resulting in their poor financial performance and an increase in non-performaning
assets for the financial institutions. To mitigate this risk, several credit risk assessment models
are used to predict their credit risk. Existing studies have majorly emphasized on models with
manual examination or the assessment of single machine learning models rather than a
comparative assessment of different models. In this study, six machine learning models were
used to assess the credit risk of all financial institutions listed on India’s Nifty 50 index- support
vector machine, KNN, logistic regression, naive bayes, decision tree, and random forest. Their
financial performance was assessed for the time period 2011-2022 using indicators such as debt-
to-equity ratio, equity capital, debt-to-asset ratio, and debt-to-capital ratio. The findings revealed
that the random forest model is the optimal model for the prediction of credit risk for financial
institutions listed on the Nifty-50 stock exchange with accuracy of 95.76% accuracy and
precision of 97.79%.

1. Introduction
Credit risk assessment refers to the process of estimating the likelihood of loss resulting from the
failure of a borrower to repay debt or loan. It therefore also encompasses the risk of failure to
realise interest capitalized from the loan (Kavun & Mihail Vorotintcev, 2016). In other words,
credit risks are economic losses that originate from the counterparty's failure to complete its
contractual compulsions, or from elevated default risk during the transaction term (Aziz &
Dowling, 2018). These risks must be assessed by financial institutions prior to lending because
they provide the firm an impetus to assess the credit admission or possible business failure of
customers to make an early financial decisions (Chen et al., 2016).
In the past, several approaches have been used for credit risk assessment such as collateral
assessment, credit scoring, and financial statement analysis. However, these manual approaches
have limitations with respect to size and volume of the data, inability to evaluate complex data
patterns that lead to better insight and detailed assessment of financial data, and lack of precision
and accuracy in prediction (Haghighat et al., 2013) Manual credit risk management system was
difficult to maintain and expensive, indicating the need for technological upgradation (Madhav,
2021). Lately, the principal advancements in AI and machine learning (ML) have led to the
generation of efficient predictive models by assessment of the historical financial data and credit
risk possibilities (Kawa et al., 2008). Some of these models include random forest modeling,
elastic net, and deep learning. They use historical data of loan applications and predict the

1
default probabilities by evaluating the income, employment history, credit scores, debt-to-
income ratio, and high-risk loans for credit risk prediction. Using these models helps then
develop risk control system which enables the reduction of risk, lowering the non-performing
loans rate, and even improving the approval speed (Hu, 2022). Furthermore, AI-based ML
algorithms can detect fraudulent transactions and prevent financial crimes. They can evaluate
transaction information in real-time and detect suspicious activities easily (Damrongsakmethee
& Neagoe, 2017). Thus, with the growth of technologies, machine learning algorithms have been
an effective strategy for predicting financial risks.
India boasts of a diverse financial sector which has been witnessing rapid expansion in term of
entities and financial service firms in recent decades. However, it has been battling the problem
of rising in non-performing assets (NPAs) (IBEF, 2022). The credit growth since 2012 has been
slow due to the prevalence of large and persistent gap in credit to the country’s gross domestic
product (GDP). With the outbreak of COVID-19, this situation worsened with rise in non-
performing loans (NPLs) and defaults. The real GDP growth has been around 6.7% from 2011-
2018 which reduced to 3.7% in 2019 (Macdonald & Xu, 2022). This has cumulatively resulted in
pushing banks to bankruptcy and hampering economic growth of India. There is therefore a need
to develop a dynamic model for prediction of credit risk. The current study aims to predict the
credit risk of financial institutions listed in nifty 50 using AI-based ML approaches.
2. Literature review
Credit risk assessment aims to maximize an organization’s risk-adjusted rate to return by
conserving credit exposure within acceptable considerations (Aziz & Dowling, 2018). This study
provides a complete overview of the use of AI-based ML approaches for better credit risk
prediction.
Application of AI-enabled ML algorithms in predicting financial risk of financial firms
Financial institutions such as banks, insurance companies, finance organizations, and investment
houses primarily provide financial aid to business enterprises in order to meet their capital needs
(Theodorou et al., 2021). Credit risk prediction of these companies is crucial to saving them from
losses. AI-based ML systems provide varying flexibility to the models that can help in making
predictions as per available historical data and set human-defined objectives, decisions, or
recommendations. AI approaches are increasingly utilizing massive alternative data sources
called big data which feed the ML models the information they need to learn and advance the
predictability and performance automatically by data and experience, without being automated to
do so by humans (OECD, 2021). A study by Berrada et al. (2022) stated that banks as are
becoming more smart, thus, are moving towards using decision tree, logistic regression,
Catboost, or support vector machine models for themselves so that all the losses from the credit
defaulters could be avoided. Though still the approach is majorly based on internal credit scoring
system but still the development of internal ML models are preferred now.
Moreover, consumer lending and small to medium enterprises (SMEs) lending include large
potential data and rely on ML to make improved lending decisions. There is a lot of empirical
support for the efficacy of ML. In consumer lending, a study by Figini et al. (2017) aimed to
construct out-of-sample forecasts that can improve the classification rates of defaults and credit

2
card holder delinquencies. The study applied ML-based approaches to construct nonlinear
nonparametric forecasting models of credit card of consumers. It was suggested that ML
approach based on decision trees for predicting in lending data would help in saving prediction
cost up to 25%. Furthermore, Khandani et al. (2010) aimed to elaborate novel strategies to
predict SME default. The study applied multivariate outlier detection approaches based on local
outlier factors. The study demonstrated that a multivariate outlier detection ML approach can
improve the credit risk prediction for SME lending using information from UniCredit Bank.
Methods of predicting financial risk
Financial markets are a significant part of social and economic organizations in the modern era.
The financial activities performed by these organizations play a significant role in the economic
development of the global business industry. Thus, many of the existing studies have focused on
using different machine learning algorithms and financial parameters for the prediction of credit
risk.
Algorithms/ML approaches
Sohn et al. (2016) aimed to handle fuzzy input and output data and proposed a fuzzy credit
scoring model to predict the default risk of a loan for an organization that is approved based on
its technology. It was established that the application of supervised ML models in credit scoring
shows good predictive accuracy. Addo et al. (2018) focused on examining the credit risk for the
prediction of the loan default possibility. With the usage of the machine learning models such as
logistic regression, random forest model, and gradient boosting model; along with using deep
learning models like neural networks with 2 and 3 hidden layers, the analysis revealed that for
loan default prediction tree-based models are more stable. Tran et al. (2022) examined the listed
companies' data for Vietnam from 2010 to 2021 for the prediction of credit risk wherein the
models like logistic regression, support vector machine, decision tree, random forest, artificial
neural network, and extreme gradient boosting methods were used. With the assessment of the
models using evaluation metrics of accuracy, recall, F1-score, precision, the area under the curve
(AUC), and receiver operating curve (ROC), the performance of models were assessed and
identified that random forest and extreme gradient boosting are the most optimal models. Even
El-qadi et al. (2022) examined the financial credit risk for Tinubu Square from 2008 to 2019
using autoregressive moving average method (ARMA) and gradient boosting (XGBoost) and
identified that the in comparison to the traditional ARMA model, the machine learning algorithm
is more optimal method for prediction. Thus, there are different machine learning algorithms
available which tend to support the prediction of the financial transactions risk.
Financial parameters
Chow (2017) examined the bankruptcy of manufacturing companies in Poland and Korea using
different financial measures and expert opinions. The analysis included 64 quantitative features
i.e. current asset ratio, working capital to assets ratio, profit margin, sales to assets ratio, gross
profit margin ratio, working capital, and many others. With the usage of KNN, linear regression,
logistic regression, AdaBoost, the Gaussian process, or other models, the bankruptcy of the
company was predicted. Petropoulos et al. (2018) included return on equity, debtors ratio,
creditors ratio, stock turnover ratio, total capital to capital employed, operating profit margin,

3
interest expense, consumer confidence indicator, available working capital, and economic
sentiment as the financial parameters. With the usage of machine learning algorithms like
XGBoost or logit regression, the study helped in having credit risk analysis for the green banking
system.
A study by Moradi & Mokhatab Rafiei (2019) proposed a model that can accommodate different
factors linked with politico-economic crises. Credit risk specialists also approved that model.
The study listed factors such as debt-to-income ratio, deposit amount, a credit score of guarantee,
customer default probability, capital return, earning and intangible assets, and available credit
amount. Further, Khalid et al. (2022) assessed the risk prediction for the non-financial firms of
Pakistan using machine learning algorithms. The examination of financial ratios like debt-to-
asset ratio, debt-to-capital ratio, interest coverage ratio, debt-to-equity ratio, degree of combined
leverage, and the equity ratio was done from 2006 to 2020 using different machine learning
algorithms. The analysis identified random forest to be the most optimal prediction model. Thus,
financial parameters are the measures used for the prediction of credit risk for companies. Credit
risk was computed on the basis of all these financial parameters. Companies with a high debt to
asset ratio tend to have higher credit risk. Based on this assessment, the below conceptual
framework is designed, revealing the factors indicating financial performance of financial
institutions.

Prediction algorithms Financial Parameters

Support vector machine Debt-to-capital ratio


Logistic regression Debt-to-asset ratio
Naïve Bayes Prediction of credit risk Debt-to-equity ratio
Random forest for financial firms Equity ratio
Decision Tree Interest coverage ratio
KNN Gross margin
Artifical neural networks Profit margin
K-means
Fuzzy credit scoring

Figure 1: Conceptual Framework (Source: Own Computed)

3. Research Methodology
To predict the credit risk for the financial firms listed in Nifty 50, the study will use the
predictive analysis method wherein the data collected for the listed firms from the money control
website is analyzed. To have an accurate prediction of credit risk, AI based on ML algorithms is
used. The study with an examination of the Nifty 50 firms, stated that 12 firms are financial
firms i.e. are from the sectors like bank, consumer finance, financial services, or insurance. Some
of these firms are Axis Bank, HDFC Bank, Bajaj Finserv, Bajaj Finance, and State Bank of
India. The data for these firms was collected from 2011 to 2022. The review of existing research
identified that debt-to-capital ratio, debt-to-equity ratio, debt-to-asset ratio, and equity ratio are
some of the main financial parameters which contribute to defining the risk possibility of a

4
financial firm. With a collection of data for the variables total shareholder funds, liabilities, and
assets; each of the defined financial parameter values was computed. Based on the defined
parameters the examination of the financial firm's performance would be done to predict credit
risk. Using Python 3.8 and Jupyter Notebook with the inclusion of libraries like NumPy, pandas,
sklearn, and matplotlib, the analysis will be done. Herein, using the identified classification
algorithms i.e. support vector machine, random forest, Naïve Bayes, KNN, logistic regression,
and decision tree; the analysis would be done.

Simple descriptive analyiss

Pre-processing of data

Data splitting with test split of 0.9

Classifier models implementation

Comparison of the models performance


and most optimal model determination

Figure 2: Proposed approach for prediction (Source: Own Computed)


To evaluate the stated classification algorithms and selected the optimal prediction model, the
comparison would be done based on the performance metrics i.e. accuracy, precision, recall, and
F1-score. Even the classification reports and confusion matrix would be applied for assessing the
predictive capability of the models. Thus, with the application of the ML models the prediction
of credit risk for financial firms will be done.
4. Data analysis
The study focused on predicting credit risk using different ML algorithms like Support vector
machine, random forest, logistic regression, decision tree, Naïve Bayes, and KNN. Herein, to
choose the most optimal method, the evaluation of each of the algorithms is done, and based on
the evaluation metrics score, the most effective model of credit risk prediction is selected.
Herein, with a collection of initial financial variables from the balance sheet and income
statement, the financial ratio values were computed.
Total liabilities
Debt−¿−capital ratio=
Total capital

5
Totalliabilities
Debt−¿−equity ratio=
Total shareholder funds
Total liabilities
Debt−¿−asset ratio=
Assets
Equity capital
Equity ratio=
Assets
Each of the ratios i.e. debt to capital ratio, debt to equity ratio, and debt to asset ratio helps in
assessing the financial capability of a company i.e. by defining the reliance of the company on
the debt. For companies with a higher debt-to-asset ratio i.e. close to 1 or more, as there is more
dependency on debt for meeting the operational needs compared to the funds available with the
company, it represents the poor financial status of financial firms. Hence, companies with higher
debt-to-asset ratios tend to be risky (Smith et al., 2017). Thus, herein, the assumption has been
made that the companies with debt to asset ratio of more than 0.9 are risky investments while of
companies having a ratio of less than 0.9, the investment is non-risky. Based on this new variable
“risk” is generated. For the assessment of the movement in the financial leverage of the
company, the movement in the debt-to-capital ratio of the company is assessed. The line graph
for the same is presented in Figure 3.

Figure 3: Trend analysis for the movement in debt-to-capital ratio for all companies
(Source: Own computed)
Figure 3 shows that for all the companies except Axis Bank, the debt-to-capital ratio over the
years has decreased representing less dependency of companies on debt for funding. However,
among the selected firms in 2022, HDFC Life insurance company had a debt-to-capital ratio of
0.88, SBI Life Insurance Company of 0.85, and HDFC (Housing Development Finance

6
Corporation Ltd) of 0.81. Thus, these companies' share is more than 0.8 representing the high
contribution of debt in the total capital of the company. Bajaj Finserv is the only firm in the
selected firms with less debt-to-capital ratio i.e. close to 0.01 representing complete dependency
of the firm on the capital and not on debt. Hence, many financial firms due to their higher debt
holdings are in the risky investment category.
The descriptive statistics revealed that the selected financial firms have risk associated with them
but for predicting the risk of the company, further analysis must be done. However the data in its
raw form is unorganized, warranting cleaning and processing in order to make it compatible for
model training. For this, the entire dataset is divided into two classifications i.e. training and test
data with a test size of 0.9. Once the split is done as for the training model, the data must be
transformed into numeric form entirely (Toleva, 2021). Therefore, using category encoders, the
company name present in the dataset is encoded. For translating each of the features present in
the dataset individually, the MinMaxScaler is used which helps in translating the data into a
range of [0,1]. The function helps in the normalization of data, thus, the model could be fitted
more appropriately on the dataset. With this, the processed dataset was used for training support
vector machine model (gamma = ‘auto’, random_state = 123, kernel = ‘linear’), decision tree
model (criterion = ‘gini’, random_state = 123), random forest model (n_estimators = 250,
random_state = 123), logistic regression model (solver = ‘liblinear’, random_state = 0), naïve
Bayes model, and KNN model. Each of the models is evaluated using the evaluation metrics i.e.
accuracy, F1-score, recall, and precision value.
Model Accuracy F1-Score Precision Recall
Support vector machine 0.9068 0.7125 0.7039 0.7222
Logistic Regression 0.9322 0.7318 0.8065 0.6907
Random Forest 0.9576 0.8220 0.9779 0.7500
Naïve Bayes 0.9153 0.4779 0.4576 0.5000
Decision Tree 1.0000 1.0000 1.0000 1.0000
KNN 0.9153 0.4779 0.4576 0.5000
Table 1: Evaluation metrics for the ML-based classification models (Source: Own
computed)
Table 1 represents that the accuracy score of the support vector machine model is 90.68%, F1-
score is 0.7125, and the precision is 70.39%. As each of these values is high, the model provides
accurate results in predicting risk and is even precise. The recall value further for the support
vector machine model is 0.7222 which is more than 0.5, thus, there is less sensitivity of the built
model with the change in the dataset. Herein, for all other models, the value of accuracy is high
but for naïve Bayes and KNN models, the value of F1-score and recall is low showing less
effectiveness of the models. The decision tree model though is fitted perfectly with values of
100% for accuracy and precision while recall and F1-score value of 1.0000 but as the completely
fitted model reduces the authenticity of the model, thus, decision tree model is not considered for
prediction. Hence, the random forest model herein with the highest value i.e. 95.76% accuracy
and 97.79% of precision while the recall value of 0.7500 and F1-score of 0.8220 is the most
optimal model for predicting the credit risk of financial firms.

7
Figure 4: Confusion matrix for the ML-based classification models (Source: Own
computed)
The results of the evaluation metrics are verified with the confusion matrix (Figure 4) wherein it
stated that each of the models has the highest value for the detection of non-risky investment as
non-risky. Thus, evaluation metrics help in deciding which model outperforms the others, and
based on it, for the prediction of credit risk for financial firms listed in Nifty 50, the random
forest model with n_estimators = 250 and random_state = 123 is the most optimal predictive
model.
5. Conclusion
As the concept of AI evolved, many new technologies and models have been developed which
helped firms in simplifying their procedure of credit risk management. In this light, this study
aimed to evaluate and formulate the AI-based prediction model. With the evaluation of the
different ML classification models the study compared the performance of each model and
identified that the random forest model with an accuracy and precision rate of more than 95% is
the most optimal model for predicting credit risk. The evaluation metrics revealed that random
forest-based model provides the most accurate prediction due to its high accuracy value.

8
In India, the banks and the financial firms have been facing the issue of credit default and a large
amount of NPAs have resulted in severely affecting their performance of their financial
performance. With the advancement in AI and ML, prediction of credit risk has become easier,
making it essential for these firms to adopt it. However, the type of ML model most suitable for
prediction varies depending upon the type of data. This study found the random forest model to
be most would help in assessing the previous credit default cases and making the relevant
predictions case scenarios with the model. The study helps financial firms by suggesting a model
which could help the firms in making relevant predictions that will automate the procedure of
risk prediction. This will help motivate them to implement AI-based practices and make the
procedure of risk assessment quicker and error-free, leading to a reduction in NPAs by making
sound lending decisions. Thus, the suggested prediction model would enable the the creation of a
more financially stable environment.
This study has certain limitations. Firstly, since we took into account only those financial
institutions which are listed in the Nifty-50 index, a generalized predictive model could not be
developed. Further, the study focuses only on ML-based models for the prediction of credit risk
due to limited resource availability. Thus, future studies could expand the scope of this study by
including all financial firms listed in the Nifty 500 market. Moreover, future studies can include
deep learning models to create a wider scope of comparison on AI-based prediction models,
recommending more optimal and effective forms of prediction models for credit risk.

References
Addo, P. M., Guegan, D., & Hassani, B. (2018). Credit Risk Analysis using Machine and Deep
Learning models. HAL.
Aziz, S., & Dowling, M. M. (2018). AI and Machine Learning for Risk Management. SSRN
Electronic Journal, January 2019. https://doi.org/10.2139/ssrn.3201337
Berrada, I. R., Barramou, F. Z., & Alami, O. B. (2022). A review of Artificial Intelligence
approach for credit risk assessment. IEEE, 20–23.
https://doi.org/10.1109/AISP53593.2022.9760655
Chen, N., Ribeiro, B., & Chen, A. (2016). Financial credit risk assessment: a recent review.
Artificial Intelligence Review, 45(1), 1–23.
https://doi.org/10.1007/S10462-015-9434-X/TABLES/5
Chow, J. C. K. (2017). Analysis of financial credit risk using machine learning. In Aston
University (Issue April).
Damrongsakmethee, T., & Neagoe, V.-E. (2017). Data Mining and Machine Learning for
Financial Analysis. Indian Journal of Science and Technology, 10(39), 1–7.
https://doi.org/10.17485/ijst/2017/v10i39/119861
El-qadi, A., Trocan, M., Frossard, T., & Díaz-rodríguez, N. (2022). Credit Risk Scoring
Forecasting Using a Time Series Approach †. Phys. Sci. Forum, 5(16).
https://doi.org/https:// doi.org/10.3390/psf2022005016

9
Figini, S., Bonelli, F., & Giovannini, E. (2017). Solvency prediction for small and medium
enterprises in banking. Decision Support Systems, 102, 91–97.
https://doi.org/10.1016/J.DSS.2017.08.001
Haghighat, M., Rastegari, H., & Nourafza, N. (2013). A Review of Data Mining Techniques for
Result Prediction in Sports. 2(5), 7–12.
Hu, Z. (2022). Development of a Machine Learning-Based Financial Risk Control System. In All
Graduate Theses and Dissertations (Issue 8479).
IBEF. (2022). Financial Services in India.
Kavun, S., & Mihail Vorotintcev. (2016). Credit Risk Assessment for Financial Institutions
Activity. Journal of Finance and Economics, 4(5), 142–150. https://doi.org/10.12691/jfe-4-
5-3
Kawa, D., Punyani, S., Nayak, P., Karkera, A., & Jyotinagar, V. (2008). IRJET-Credit Risk
Assessment from Combined Bank Records using Federated Learning Credit Risk
Assessment from Combined Bank Records using Federated Learning. International
Research Journal of Engineering and Technology, 1355.
Khalid, S., Khan, M. A., Mazliham, M. S., Alam, M. M., Aman, N., Taj, M. T., Zaka, R., &
Jehangir, M. (2022). Predicting Risk through Artificial Intelligence Based on Machine
Learning Algorithms : A Case of Pakistani Nonfinancial Firms. Complexity.
Khandani, A. E., Kim, A. J., & Lo, A. W. (2010). Consumer credit-risk models via machine-
learning algorithms. Journal of Banking & Finance, 34(11), 2767–2787.
https://doi.org/10.1016/J.JBANKFIN.2010.06.001
Macdonald, M., & Xu, T. (2022). Financial Sector and Economic Growth in India. In IMF
Working Papers.
Madhav, V. V. (2021). Role of credit rating agencies in analyzing banks credit risk management:
Advantages and Disadvantages. High Technology Letter, May.
Moradi, S., & Mokhatab Rafiei, F. (2019). A dynamic credit risk assessment model with data
mining techniques: evidence from Iranian banks. Financial Innovation 2019 5:1, 5(1), 1–
27. https://doi.org/10.1186/S40854-019-0121-9
OECD. (2021). Artificial Intelligence, Machine Learning and Big Data in Finance:
Opportunities, Challenges, and Implications for Policy Makers. OECD Business and
Finance Outlook 2020 : Sustainable and Resilient Finance., 1–72.
Petropoulos, A., Siakoulis, V., Stavroulakis, E., & Klamargias, A. (2018). A robust machine
learning approach for credit risk analysis of large loan level datasets using deep learning
and extreme gradient boosting. Are Post-Crisis Statistical Initiatives Completed?, August,
30–31.
Smith, J. A., Grill, M., & Lang, J. H. (2017). The leverage ratio, risk-taking and bank stability. In
European Central Bank.
Sohn, S. Y., Kim, D. H., & Yoon, J. H. (2016). Technology credit scoring model with fuzzy

10
logistic regression. Applied Soft Computing, 43, 150–158.
https://doi.org/10.1016/J.ASOC.2016.02.025
Theodorou, T. I., Zamichos, A., Skoumperdis, M., Kougioumtzidou, A., Tsolaki, K.,
Papadopoulos, D., Patsios, T., Papanikolaou, G., Konstantinidis, A., Drosou, A., &
Tzovaras, D. (2021). An AI-enabled stock prediction platform combining news and social
sensing with financial statements. Future Internet, 13(6), 1–22.
https://doi.org/10.3390/fi13060138
Toleva, B. (2021). The Proportion for Splitting Data into Training and Test Set for the Bootstrap
in Classification Problems. Business Systems Research Journal, 12(1).
https://doi.org/10.2478/bsrj-2021-0015
Tran, K. L., Le, H. A., Nguyen, T. H., & Nguyen, D. T. (2022). Explainable Machine Learning
for Financial Distress Prediction: Evidence from Vietnam. Data, 7(160).
https://doi.org/https://doi.org/10.3390/ data7110160

11

You might also like