Model-for-the-Prediction-of-Default-Risk-of-Funding-Requests-Using-Data-Mining-Sameh-Ali-2
Model-for-the-Prediction-of-Default-Risk-of-Funding-Requests-Using-Data-Mining-Sameh-Ali-2
Model-for-the-Prediction-of-Default-Risk-of-Funding-Requests-Using-Data-Mining-Sameh-Ali-2
Model for the Prediction of Default Risk of Funding Requests Using Data Mining
Sameh Ali1*, Atef Raslan2 and Lamiaa Fattouh3
1
Faculty of Graduate Studies for Statistical Research Cairo University Cairo, Egypt.
2
Department of Information Systems, Higher Institute of Advanced Studies.
3
Faculty of Graduate Studies for Statistical Research Cairo University Cairo, Egypt.
(Corresponding author: Sameh Ali*)
(Received 02 May 2024, Revised 10 June 2024, Accepted 15 July 2024)
(Published by Research Trend, Website: www.researchtrend.net)
BUSINESS UNDERSTANDING
In the initial phase of the CRISP-DM process, the
primary focus is on comprehending the business
objectives and constraints, maintaining equilibrium
amidst various priorities within the organization,
Tamweely. This stage is dedicated to pinpointing the Fig. 2. Data Preparation.
Business Success Criteria: Here, clear and quantifiable
standards are set to evaluate the success of the model
pivotal factors that impact the data analysis objectives.
Ali et al., International Journal on Emerging Technologies 15(2): 05-12(2024) 8
DATA SELECTION Table 3: Information about the data set (Conditional
Attribute).
The main aim of data selection is to identify the suitable
data type, source, and instrument(s). Initially, data Attribute Description
reduction involves eliminating unnecessary or less BRCODE Branch Code
DEBT_TYPE Product Type
important attributes from the original dataset. This
PRIMIUM_VALUE The value of Installment
process is based on the objective of the study at hand. DEBT_PRD Funding duration in months
Microfinance data is obtained from Tamweely APPROV_VALUE Funding value
Microfinance from 2018 to 2023. In this dataset, there RATE Annual interest
are 534639 instances, and 19 Attribute the Table 1 APP_FEE Application submission fees
gives information about the data set. TOTAL_REQ_
Total Funding
AMTOUNT
DATA CLEANING The payment period is in
REQ_NO_MONTHS
months
After attribute selection, the next step is data cleaning, TOTAL_AMOUNT Total Funding with interest
which is applied to the dataset with the selected INDUSTRY_CODE Industry code
MAIN_ACTIVITY Main activity code
attributes. Data collected for the mining process often ACTIVITY_TYPE Sub-activity code
contains missing values, noise, or inconsistencies, GOV_ID Governorate code
which can result in the generation of unreliable JOBCODE Job Code
information during the mining process. A high-quality EDUCATION_CODE Education Code
data mining process typically produces efficient results, SEX Gender
requiring preprocessing of the collected data to enhance SCORE Credit inquiry
OPEN_CREDITS Number of open Funding
its quality and, consequently, the mining outcomes
(Aljawarneh et al., 2019).
In this study, various standard data preprocessing tasks
are conducted on the dataset, including data integration,
data cleaning, data reduction, and data transformation.
The initial step of data preprocessing involves Data
Filtering, where relevant attributes necessary for
prediction are selected from the company dataset. Since
the dataset is unorganized, with features nested within
each other, efforts a Are made to rearrange similar
fields together to ensure accuracy. For example, all
features related to monetary details are grouped, and
likewise for premium-related features.
The subsequent task is handling missing data. The
dataset contains missing and imputed data.which are
addressed in this step. For instance, missing data in
attributes such as "Total Amount," "Main Activity
code," "Activity Type Code," and "Education Code" are Fig. 3. Missing value [Total Amount].
handled by replacing the missing value with the mean
of all samples belonging to the same class as the given
tuples. As shown in the following figure (3,4,5,6).
Filling in missing values, and removing inconsistencies
and noises were major data-cleaning activities
undertaken at this stage of data preparation. Some fields
had missing values, with 12,000 missed values from the
"Total Amount" field and 7,600 missed values from the
"Education Code" field. These values were considered
most probable because they had the highest mode in the
original dataset.
Table 2: Information about the data set (Class
Attribute).
Flag Description
Good Disbursement of funding
Bad Customer Reject
V-Bad Reject final
How to cite this article: Sameh Ali, Atef Raslan and Lamiaa Fattouh (2024). Model for the Prediction of Default Risk of Funding
Requests Using Data Mining. International Journal on Emerging Technologies, 15(2): 05–12.