Project Loan Automl
Project Loan Automl
Project Loan Automl
Submitted by
Mr. KUMARAKRISHNAN.S
of
BACHELOR OF TECHNOLOGY
APRIL 2023
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that the project work entitled “LOAN PREDICTION SYSTEM USING
[19TD0176], SHRUTI.R [19TD0233] in partial fulfillment of the requirement, for the award of
B.Tech Degree in Computer Science and Engineering by Pondicherry University during the
We also sincerely thank our Head of the Department, Dr. K. PREMKUMAR whose
continuous encouragement and sufficient comments enabled us to complete our project report.
We thank all our Staff members who have been by our side always and helped us with our
project. We also sincerely thank all the lab technicians for their help as in the course of our project
development.
We would also like to extend our sincere gratitude and grateful thanks to our Director cum
Principal Dr. V. S. K. VENKATACHALAPATHY for having extended the Research and
Development facilities of the department.
We are grateful to our Founder Chairman Shri. N. KESAVAN. He has been a constant
source of inspiration right from the beginning.
We would like to express our faithful and grateful thanks to our Chairman and Managing
Director Shri. M. DHANASEKARAN for his support.
We would also like to thank our Vice Chairman Shri. S. V. SUGUMARAN, for providing
us with pleasant learning environment.
We would like to thank our Secretary Dr. K. NARAYANASAMY for his support.
We wish to thank our family members and friends for their constant encouragement,
constructive criticisms and suggestions that has helped us in timely completion of this project.
Last but not the least; we would like to thank the ALMIGHTY for His grace and blessings
over us throughout the project.
iii
ABSTRACT
Globally it is disconcerting how frequently banks lose money to loan borrowers as a result
of loandefault. This project is a modest attempt to put System to use in a realistic way to
determine what else it might be used for. In order to predict fraud in bank loan
administration and therefore prevent loan default, this work leverages historical loan
records based on the use of machine learning. Manual examination by a credit officer
would not have shown this fraud. The mission of the banking application for loan
approval system project is to create a tool that can atomize the concepts of loan
management. Prior to issuing a loan, banks are utilizing more sophisticated techniques to
check user information and discover actual facts about just the user.
LIST OF TABLES
LIST OF FIGURES
1.1 LEARNING OF ML 2
LIST OF ABBREVIATIONS
ML - Machine Learning
ABSTRACT iii
LIST OF TABLES
iv
LIST OF FIGURES v
LIST OF ABBREVIATIONS vi
INTRODUCTION 1
1.
1.1 BASIC INTRODUCTION 1
3. EXISTING SYSTEM 23
4. PROPOSED SYSTEM 25
4.1 INTRODUCTION 25
5. SYSTEM REQUIREMENTS 30
7. CONCLUSION
34
APPENDIX-I 35
APPENDIX-II 40
REFERENCES
41
CHAPTER 1
INTRODUCTION
The main business of practically all banks is the distribution of loans. The majority
of the bank's assets were derived directly from the revenue generated by the loans that
the bank dispersed. In a banking environment, the primary objective is to put one's assets
in trustworthy hands. Many banks and financial institutions today lend money through a
comprehensive verification and validation process, but there is still no guarantee that the
chosen applicant is the most credible candidate out of all applicants.
Existing technologies that require human intelligence for the loan approval application
have a number of drawbacks:
Machine learning is the brain where all the learning takes place. The way the
machine learns is similar to the human being. Humans learn from experience. The more
we know, the more easily we can predict. By analogy, when we face an unknown
situation, the likelihood of success is lower than the known situation. Machines are
trained the same. To make an accurate prediction, the machine sees an example. When
we give the machine a similar example, it can figure out the outcome. However, like a
human, if its feed a previously unseen example, the machine has difficulties to predict.
The core objective of machine learning is the learning and inference. First of all, the
machine learns through the discovery of patterns. This discovery is made thanks to
the data. One crucial part of the data scientist is to choose carefully which data to provide
to the machine. The list of attributes used to solve a problem is called a feature
vector. You can think of a feature vector as a subset of data that is used to tackle a
problem. The machine uses some fancy algorithms to simplify the reality and transform
this discovery into a model. Therefore, the learning stage is used to describe the data and
summarize it into a model.
2
For instance, the machine is trying to understand the relationship between the wage of an
individual and the likelihood to go to a fancy restaurant. It turns out the machine finds a
positive relationship between wage and going to a high-end restaurant.
Inferring
The breakthrough comes with the idea that a machine can singularly learn from the data
(i.e., example) to produce accurate results. Machine learning is closely related to data
mining and Bayesian predictive modeling. The machine receives data as input and uses
an algorithm to formulate answers.
A typical machine learning tasks are to provide a recommendation. For those who have
a Netflix account, all recommendations of movies or series are based on the user's
historical data. Tech companies are using unsupervised learning to improve the user
experience with personalizing recommendation.
3
1.1.2 MACHINE LEARNING vs TRADITIONAL PROGRAMMING
Machine learning is supposed to overcome this issue. The machine learns how the input
and output data are correlated and it writes a rule. The programmers do not need to write
new rules each time there is new data. The algorithms adapt in response to new data and
experiences to improve efficacy over time.
4
Figure 1.4-Machine Learning
The lack of data or the diversity in the dataset is the main problem with machine
learning. If there is no data, a machine cannot learn. A dataset with little heterogeneity
also makes the computer work harder. For a machine to get insightful knowledge,
heterogeneity is required. When there are no or few variations, it is uncommon for an
algorithm to be able to extract information. For the machine to learn, it is advised that
each group receive at least 20 observations. Owing to this restriction, evaluation and
prediction are substandard.
The following key points sum up the basic aspect of machine curriculum design:
1. Define a question
2. Collect data
3. Visualize data
4. Train the algorithm
5. Test the Algorithm
6. Collect feedback from Customer
7. Refine the trained algorithm
8. Loop 4-7 until the results are satisfying
9. Use the model to make a predictive analysis
5
As soon as the algorithm masters arriving at the correct conclusions, it applies that skill
to fresh sets of data.
Machine learning is the process by which computers figure out how to carry out
tasks without being specifically taught to do so. Computers use available data to learn in
order to do specific jobs. For straightforward jobs given to computers, it is possible to
build algorithms that instruct the device how to carry out all the steps necessary to address
the issue at hand; no learning is required on the part of the computer. It can be difficult
for a human to manually develop the required algorithms for more complex tasks. In fact,
it may prove more beneficial to aid the computer in creating its own algorithm than to
have human programmers identify each key step.
6
Machine learning can be grouped into two broad learning tasks: Supervised and
Unsupervised. There are many other algorithms;
Supervised learning
An algorithm uses training data and feedback from humans to learn the
relationship of given inputs to a given output. For instance, a practitioner can use
marketing expense and weather forecast as input data to predict the sales of cans.
You can use supervised learning when the output data is known. The algorithm will
predict new data.
Classification task
Regression task
Classification
Imagine you want to predict the gender of a customer for a commercial. You will
start gathering data on the height, weight, job, salary, purchasing basket, etc. from your
customer database. You know the gender of each of your customer, it can only be male
or female. The objective of the classifier will be to assign a probability of being a male
or a female (i.e., the label) based on the information (i.e., features you have collected).
When the model learned how to recognize male or female, you can use new data to make
a prediction. For instance, you just got new information from an unknown customer, and
you want to know if it is a male or female. If the classifier predicts male = 70%, it means
the algorithm is sure at 70% that this customer is a male, and 30% it is a female.
The label can be of two or more classes. The above Machine learning example has only
two classes, but if a classifier needs to predict object, it has dozens of classes (e.g., glass,
table, shoes, etc. each object represents a class)
7
Regression
When the output is a continuous value, the task is a regression. For instance, a
financial analyst may need to forecast the value of a stock based on a range of feature
like equity, previous stock performances, macroeconomics index. The system will be
trained to estimate the price of the stocks with the lowest possible error.
8
Support Vector Machine, or SVM, is
Support vector
typically used for the classification task.
machine Regression
SVM algorithm finds a hyperplane that
Unsupervised learning
9
Algorithm Description
Type
10
CHOOSING OF MACHINE LEARNING ALGORITHM:
There are plenty of machine learning algorithms. The choice of the algorithm is based
on the objective. In the Machine learning example below, the task is to predict thetype of
flower among the three varieties. The predictions are based on the length and thewidth of the
petal. The picture depicts the results of ten different algorithms. The pictureon the top left is
the dataset. The data is classified into three categories: red, light blue and dark blue. There
are some groupings. For instance, from the second image, everything in the upper left
belongs to the red category, in the middle part, there is a mixture of uncertainty and light blue
while the bottom corresponds to the dark category. The other images show different
algorithms and how they try to classified.
Some researches that were carried out in Machine Learning are as follows;
• Text Mining and Text Classification
• Image-Based Applications
• Automated Machine Learning
• Machine Vision
• Clustering
• Optimization
• Voice Classification
• Sentiment Analysis
• Recommendation Framework Project.
• Prediction and Detection
11
(iii) For instance, IBM's Watson platform can determine shipping container
damage. Watson combines visual and systems-based data to track, report and make
recommendations in real-time.
(iv) In past year stock manager relies extensively on the primary method to
evaluate and forecast the inventory. When combining big data and machine learning,
better forecasting techniques have been implemented (an improvement of 20 to 30 %
over traditional forecasting tools). In term of sales, it means an increase of 2 to 3 % due
to the potential reduction in inventory costs.
12
Traditional Machine Learning Automated Machine Learning
Human give input manually and the It is a automated process of using machine
problem is solved learning to solve the problems and predictions
as well
It is used to perform two tasks like In this all data preparation to training the
regression and classification selection model and algorithms all are done in
the automated way.
Once the model is designed the The model is developed and changes can be
changes cannot be made until it is evolved automatically where quality of data is
done manually. accessed.
It uses already designed algorithm to It can create their own algorithm than to have
perform the tasks. human programmers
13
1.2.2 WORKING OF AUTOML
This could include everything from data preparation to training to the selection of
models and algorithms — all of which is done in a completely automated way. There are
many types of machine learning, but with supervised learning, tagged input and output
data is constantly fed into human-trained systems, offering predictions with increasing
accuracy after each new data set is fed into the system.
For example, if a company wants to be able to predict whether or not somebody
is going to buy its product, they first have to have a data set of past customers, organized
by who bought and didn’t buy. Then it has to be able to use that data set to predict what
a whole new set of customers will decide to do. Or, if you want a computer to be able to
identify a cat in a video, you have to first train it by showing it other videos with cats so
it is able to accurately identify one in a video it hasn’t seen before.
At first blush, automated machine learning may seem a bit redundant. After all,
machine learning is already about automating the process of identifying patterns in data
to make predictions. The process, which relies on algorithms and statistical models,
doesn’t require consistent, or explicit programming. Once a machine learning model is
built, it can then be further optimized through trial and error and feedback, meaning the
machine can learn by experience and increased exposure to data — much like humans
do.The goal of autoML is to both speed up the AI development process as well as make
the technology more accessible.
In practice, much of the work required to make a machine learning model is rather
laborious, and requires data scientists to make a lot of different decisions. They have to
decide how many layers to include in neural networks, what weights to give inputs at
each node, which algorithms to use, and more. It’s a big job, and it requires a lot of
specialized skill and intuition to do it properly. The more complex the model, the more
complex the work. And some experts say automating some of that work will be necessary
as AI systems become more complex. So, autoML aims to eliminate the guesswork for
humans by taking over the decisions data scientists and researchers currently have to
make while designing their machine learning models.
14
1.2.4 AUTOML AND SUPERVISED LEARNING
These are just a few popular choices being used among business professionals to
automate machine learning processes.
Aible
AutoKeras
Auto-PyTorch
Auto-Sklearn
Google Cloud AutoML
AIBLE
Aible’s suite of AI solutions works to automate data science and data engineering
tasks across multiple industries. Its products can detect key data relationships, assess data
readiness for model input plus augment data analytics and recommendations. Aible
connects directly to the cloud for data security, and can be integrated with other tools like
Salesforce and Tableau.
15
AUTOKERAS
AutoKeras is an open-source library and autoML tool based on Keras, a Python
machine learning API. The tool can automate classification and regression tasks in deep
learning models for images, text and structured data. AutoKeras largely applies neural
architecture search to optimize code writing, machine learning algorithm selection and
pipeline design.
AUTO-PYTORCH
Auto-PyTorch, based from the PyTorch machine learning library in Python, allows
for fully automated deep learning (autoDL) tasks. It automates algorithm selection and
hyperparameter tuning for deep neural network architectures, and can support tabular and
time series datasets. Auto-PyTorch applies Bayesian optimization, meta-learning and
ensemble construction for automation.
AUTO-SKLEARN
Auto-Sklearn is an open-source autoML tool built on the scikit-learn machine
learning library in Python. The tool automates supervised machine learning pipeline
creation and can be used as a drop-in replacement for scikit-learn classifiers in Python.
Like Auto-PyTorch, Auto-Sklearn utilizes meta-learning, ensemble learning and Bayesian
optimization to automatically search for learning algorithms when given a new dataset.
16
1.2.7 CHALLENGES FACED IN AUTOML
Can’t understand the business context of the problem it is trying to solve.
No standard for what a “good” model looks like.
Doesn’t offer the “why” of its decision-making process.
Too complex for non-data scientists to pull off successfully.
Can’t automate ethics or fairness.
17
AutoSKLearn:
Autosklearn is an extension of the AutoWEKA usimg the python library Python
library scikit-learn which is a drop-in replacement for regular scikit-learn classifiers and
regressors. Auto-PyTorch is based on the deep learning framework PyTorch and jointly
optimizes hyperparameters and the neural architecture.
SALESFORCE
18
CHAPTER 2
LITERATURE SURVEY
AUTHORS: U. Thongsatapornwatana
In recent years, data mining, a type of data analysis, has been utilised to examine loan
data that was previously saved from a variety of sources in order to identify patterns and
trends. Furthermore, it can be used to improve efficiency in resolving loans more quickly
and can be used to automatically notify loans. There are numerous data mining methods,
though. The right data mining techniques must be chosen in order to improve loan
detection efficiency. In particular, applications that were used to solve loans are reviewed
in this paper's study of the literature on various data mining applications. The survey
sheds light on the difficulties of loan data mining as well as research gaps.
Two main goals drive the research that is presented here. To predict shootings, risk terrain
modelling (RTM) is used as the initial step. The risk terrain maps that were created using
RTM assess the risks of upcoming shootings as they are distributed over a geography
using a variety of contextual data pertinent to the opportunity structure of shootings. The
second goal was to evaluate the risk terrain maps' capacity for forecasting over two six-
month periods and contrast it with that of retroactive hot spot maps. The findings indicate
that risk terrains are significantly more accurate than retroactive hot spot mapping at
forecasting future shootings across a variety of cut points.
19
2.3 Using geographically weighted regression to explore local Loan
patterns
The current study explores the spatial patterns of both Loan and its variables in a
structural model of violent Loan in Portland, Oregon. The paper presents findings from
a global ordinary least squares model, which is considered to fit for all sites within the
study area, using typical structural measures taken from an opportunity framework. Then,
as an alternative to such conventional methods of modelling Loan, geographically
weighted regression (GWR) is presented. The GWR approach estimates a local model
and generates a set of mappable parameter estimates and spatially variable t-values of
significance. It is discovered that a number of structural metrics have correlations with
Loan that differ dramatically by region. According to the results, a mixed model that
includes both fixed and spatially variable factors may produce the best realistic model of
Loan. The present investigation highlights the relevance of GWR for addressing
misspecification of a global model of urban Ratio and evaluating local dynamics that
increase Loan levels.
The spatio-temporal dynamics of Loan activity are described by a set of models that we
introduce in this research. Here it is claimed that one can see the development of hot spots
using a basic set of procedures that relate to components essential to the study of Loan.
By examining the most basic iterations of our model, we demonstrate a self- organised
critical condition of illicit activity that, depending on the situation, we proposeto refer to
as a warm spot or a tepid milieu2. In contrast to true hot spots where localized high level
or peaks are being generated, it is characterized by a positive level of unlawfulor uncivil
activity that sustains itself without exploding. Additionally, we offer modifications to our
model that account for local and long-range interactions, the impacts of repeated
victimization, and briefly address some of the outcomes, such as hysteresis events.
20
2.5 Language usage on Twitter predicts Loan rates
Social networks 1 generate a vast amount of data. Over 230 million active users
contribute more than 500 million tweets daily to the microblogging network Twitter. We
suggest using Twitter's open data for analysis to forecast loan rates. In recent years, loan
rates have gone up. Even while loan stoppers use a variety of techniques to lower loan
rates, none of the earlier strategies focused on using the language used in forms as a
source of data to forecast loan rates. In this study, we propose that a reliable method for
forecasting loan rates in cities is language analysis of tweets. Three months' worth of
tweets in Houston and New York City were gathered by locking the collection by
geographic longitude and latitude.
Data mining is the process of reviewing large amounts of previously stored data in order
to discover patterns and trends in it. Furthermore, it can be used to increase efficiency in
clearing loans more rapidly and can be used to automatically notify loans.
Two main goals drive the research that is presented here. To predict shootings, risk terrain
mapping (RTM) is used as the initial step. The risk terrain maps that were created using
RTM assess the risks of upcoming shootings as they are distributed more than a
geography using a variety of contextual data pertaining to the opportunity structure of
shootings. The second goal was to evaluate the risk terrain maps' capacity for forecasting
over two six-month periods and contrasting it with that of retrospectively hot spot maps.
The relevant papers referred for this project have been also presented in table 2.1
21
S. AUTHORS TITLE YEAR PROPOSEDSYSTEM ADDITIONAL
NO DATA
1. Thongsa A survey of The literatures on various The data mining
Tapornwatana data mining data mining applications, for finding the
techniques especially applications that patterns and
for applied to solve the Loans. trends in Loan
analyzing 2021 Survey also throws light on to beused
Loan research gaps and challenges appropriately
patterns of Loan data mining. and to be a help
for beginners in
the research of
Loan data
mining.
22
CHAPTER 3
EXISTING SYSTEM
The System Development Life Cycle was chosen as the research methodology for this
investigation. The SDLC method that was utilised for software development is called the
waterfall model. In the waterfall model method, the development process is broken down
into discrete steps. One phase's output serves as the sequential input for the following
phase.
The current loan process at Staff Multipurpose Cooperative Society Limited JOSTUM
involves manual decision-making and requires the applicant to personally meet with
cooperative society administration. To determine if the loan will be approved or denied,
the applicant may need to visit the cooperative society office.
23
Figure 3.1-Existing System of Prediction Model
24
CHAPTER 4
PROPOSED SYSTEM
4.1 INTRODUCTION
The model that is proposed gives us more accuracy than the traditional model. The
proposed model can compare the different models and algorithms simultaneously and
provides us the higher accuracy than the single model or single algorithm execution. The
dataset are collected using kaggle and then the collected datasets are trained and tested for
providing the prediction accuracy.Comparitively the accuracy of the AutoML prediction is
higher than the other methods or models.The traditional method uses the processes of
preprocessing,cleansing,training etc., though the AutoML diminish the proceeding process.
Machine Learning is the process by which computer figure out how to carry out tasks
without being specifically taught thereby. Computers use available data to learn in order to
do specific jobs. For the tasks that are provided , the algorithms are designed and built to
perform the work and no learning is required on the part of the computer. In fact, it may
prove more beneficial to aid the computer creating its own algorithm than to have human
programmers identify each key step.One of the most effective approaches in machine
learning for the issues is AutoML. AutoML is a collection of tools that can automate the
process of using Machine Learning to solve problems and predictions as well.
ALGORITHM
Begin
Step 1: Import the CSV File
Step 2: Apply preprocessing to remove duplication and Null Data
Step 3: Perform Random feature selection and fix the hyperparameters
Step 4: The model is trained using AutoML
Step 5: Performing hyperparameter optimization to achieve maximum performance in
minimum time
Step 6: Evaluate the predicted output with other models
Step 7: End
25
4.2 MODELLING OF PROPOSED SYSTEM
System architecture is the conceptual model that defines the structure, behaviour
and representation of a system. The architecture of the system is shown in the figure
below:
The architecture shows that the dataset is first preprocessed which involve transforming
raw data into an understandable format and check if there are missing values. Processing
the dataset is done to remove rows or columns that have missing values due to mistakes
the might have occurred when entering the data into the CSV file. This is important as
it helps prevent some runtime errors like Not a Number (NaN) error that could prevent
the system from working effectively. The dataset is then normalized which involves
rescaling real valued numeric attributes into the range 0 and 1. The dataset is then
divided into training and testing datasets.
26
Figure 4.1-Proposed Model using AutoML
FACT FINDING
Fact finding is process of collection of data and information based on techniques which
contain sampling of existing documents, research, observation, questionnaires,
interviews, prototyping and joint requirements planning. System analyst uses suitable
fact finding techniques to develop and implement the current existing system. Collecting
required facts are very important to apply tools in System Development Life Cycle
because tools cannot be used efficiently and effectively without proper extracting from
facts. Fact finding techniques are used in the early stage of System Development Life
Cycle including system analysis phase, design and post implementation review. The
following techniques are to be executed:
The dataset has thirteen (13) attributes, twelve (12) of which are feature attributes and
one (1) is the predicted value.
Samples of 614 loan applicant information were collected to train the model
3.The data is available in a CSV (Comma Separated Values) file that can easily be
loaded into the system for training the model.
27
Data Collection:
This is the first real step towards the real development of a machine learning
model, collecting data. This is a critical step that will cascade in how good the model
will be, the more and better data that we get, the better our model will perform.There
are several techniques to collect the data, like web scraping, manual interventionsand etc.
Comparison of Machine Learning Algorithms for Predicting Loan aken from kaggle
and some other source.
Dataset:
The dataset consists of 821 individual data. There are 27 columns in the
dataset, which are described below.
STATE: State in India
DISTRICT: District in the state of India.
Year: 2001-2021
Total Loan: Total number of total Loan rate
Data Preparation:
We will transform the data. By getting rid of missing data and removing some
columns. First we will create a list of column names that we want to keep or retain.
Next, we drop or remove all columns except for the columns that we want to retain.
Model Selection:
While creating a machine learning model, we need two dataset, one for training
and other for testing. But now we have only one. So lets split this in two with a ratio of
80:20. We will also divide the dataframe into feature column and label column.
We will use AutoML, which fits multiple decision tree to the data. Finally I train the
model by passing train_x, train_y to the fit method.
Once the model is trained, we need to Test the model. For that we will pass test_x to the
predict method.
28
Analyze and Prediction:
Once you’re confident enough to take your trained and tested model into the
production-ready environment, the first step is to save it into a .h5 or . pkl file using a
library like pickle.Make sure you have pickle installed in your environment. Next, let’s
import the module and dump the model into . pkl file.
The main goal of this project is to decide whether or not a loan applicant is eligible
for a loan based on the numerous qualities that the user provides as input. The
Machine Learning Model is given these features, and it generates a forecast based
on how they affect the label. This was accomplished by first looking for a dataset
that met both the developer's and the user's requirements. The interest rate on a loan
may rise year after year, necessitating the development of a system that can predict
the types of loans available and their rates. This technique can assist the cooperative
society in determining which client categories are eligible for a certain loan. By
predicting loan, The cooperative society can lower its non-performing assets by
anticipating loan performance.
29
CHAPTER 5
SYSTEM REQUIREMENTS
5.1.1 Pentium 4
Is a series of single-core CPUs for desktops, laptops and entry-level servers
manufactured by Intel. The processors were shipped from November 20, 2000 until
August 8, 2008.[3][4] The production of Netburst processors was active from 2000 until
May 21, 2010.[3][4]
All Pentium 4 CPUs are based on the NetBurst microarchitecture. The Pentium 4
Willamette (180 nm) introduced SSE2, while the Prescott (90 nm) introduced SSE3.
Later versions introduced Hyper-Threading Technology (HTT).
The first Pentium 4-branded processor to implement 64-bit was the Prescott (90
nm) (February 2004), but this feature was not enabled. Intel subsequently began selling
64-bit Pentium 4s using the "E0" revision of the Prescotts, being sold on the OEM market
as the Pentium 4, model F. The E0 revision also adds eXecute Disable (XD) (Intel's name
for the NX bit) to Intel 64. Intel's official launch of Intel 64 (under the name EM64T at
that time) in mainstream desktop processors was the N0 stepping Prescott-2M.
Intel also marketed a version of their low-end Celeron processors based on the
NetBurst microarchitecture (often referred to as Celeron 4), and a high-end derivative,
Xeon, intended for multi-socket servers and workstations.
30
5.2 SOFTWARE REQUIREMENTS
5.2.1 Windows:
Microsoft Windows commonly known as Windows, was introduced by Microsoft on 20
November, 1985. Windows is proprietary and closed source operating system. Windows
uses Graphical User Interface (GUI) to interact with users. Bill Gates and Paul Allen
founded Microsoft and Windows operating system. Windows is supported on almost
every computer platform ARM, ARM-64, x86-64, IA-32 as it is most widely used
operating system and captures 90% of total share in personal computer (PC) market.
Journey of Microsoft Windows from first version to latest version:
5.2.2 Python:
Python is a general purpose, dynamic, high-level, and interpreted programming language.
It supports Object Oriented programming approach to develop applications. It is simple
and easy to learn and provides lots of high-level data structures. It is easy to learn yet
powerful and versatile scripting language, which makes it attractive for Application
Development.
Python's syntax and dynamic typing with its interpreted nature make it an ideal language
for scripting and rapid application development.
Python supports multiple programming pattern, including object-oriented, imperative,
and functional or procedural programming styles.
Python is not intended to work in a particular area, such as web programming. That is
why it is known as multipurpose programming language because it can be used with web,
enterprise, 3D CAD, etc.
31
We don't need to use data types to declare variable because it is dynamically typed so we
can write a=10 to assign an integer value in an integer variable.
Python makes the development and debugging fast because there is no compilation step
included in Python development, and edit-test-debug cycle is very fast.
5.2.3 MySQL:
Structured Query Language (SQL), which is a computer language for storing,
manipulating, and retrieving data stored in relational database management systems
(RDBMS). SQL was developed at IBM by Donald Chamberlin , Donald C. Messerli ,
and Raymond F. Boyce in the year 1970s.
MySQL is an open-source Relational Database Management System that stores data in a
structured format using rows and columns. MYSQL language is easy to use as compared
to other programming language like C, C++, Java, etc. By learning some basic commands
we can work, create and interact with the Database.
32
CHAPTER 6
IMPLEMENTATION
Our Base paper consists of process of predicting the loan and prevent loan defaulters
using historical loan records.We have included the processing modules which use automl
where it predicts the best algorithm and predicts whether to give the loan or not through
website.
The dataset which is collected are used to train the model .The model which we developed
for used to predict the loan and prevent the loan defaulters. Using automl,the best
algorithm is choosen .That algorithm is used to predict whether the person is eligibleto get
the loan or not. We have developed the website where we can give the live data ofa person
and view the results.The test accuracy is over 90%.
33
CHAPTER 7
CONCLUSION
With the aid of machine learning technologies, the proposed model, finding
connections and patterns between disparate data is now simple. The major task of this
project is to determine the sort of loan that might occur given the place at which it has
already happened. Using a training set of data that has undergone data cleansing and data
transformation, we have developed a model using the machine learning idea. The model
accurately predicts the type of loan. Analyzing a data set is made easier by data
visualisation. The graphs include bar, pie, line, and scatter diagrams, each with their
unique features. We created a large number of graphs and discovered some unique
statistics that aided in studying Indian Loans datasets that can assist us in determining the
components that contribute to a safe society.
34
APPENDICES
APPENDIX – I
Web Application:
The web application contains several modules where the predicted values and the
current values can be viewed and the result is also given. The datasets are trained by itself
using the AutoML feature included.
Module 1:
This module is the Homepage created for the web application. One has to login with
their credentials to check whether they will be provided or not.
Module 2:
In this module, the user logs in with his/her username and password.
35
Only with the correct credentials, the user will be able to login into the system.
Module 3:
Once the user has logged in with the correct credentials, An .csv file containing the
datasets should be uploaded to train itself. In this module, The AutoML feature is used
to instead of manually training the data.
36
Module 4:
After the file is uploaded, a preview is of the data are viewed upon the screen .
The data present are 614, which can also be modified to include furthermore data. Any
missing values or raw data are corrected by the AutoML and does not require any
manual pre-processing.
The user must click on the “Click to Train\ Test” button to train the datasets by itself
.This process might take a few seconds. Once the training is completed , A pop-up
message arrives saying “Training finished”.
37
Module 5:
This module takes the necessary information from the user such as;
Name of The Applicant
Age of the Applicant
Gender
Marital Status
Dependents
Education Status
Credit History
Employment Status
Income of the Applicant and Co-Applicant If dependents are > 0.
Loan Amount
Duration of the Loan
Region where the Applicant resides
Selection of best model to make the prediction
38
Once, after the Applicant has filled all the necessary details and clicked the SUBMIT
button,The Result is displayed.
Module 6:
In this Analysis module, The accuracy plot of the dataset are displayed.
39
APPENDIX -II
PUBLICATION:
40
REFERENCES
[6]Y. Yang, J. Dong, X. Sun, E. Lima, Q. Mu, and X.Wang, ``A CFCC-LSTM
model for sea surface temperature prediction,'' IEEE Geosci. Remote Sens.
Lett., vol. 15, no. 2, pp. 207211, Feb. 2018.
.
[7]X. Hong, R. Lin, C. Yang, N. Zeng, C. Cai, J. Gou, and J. Yang, ``Predicting
Alzheimer's disease using LSTM,'' IEEE Access, vol. 7, pp. 8089380901,
2019.
41
[11] W. H. Li, L.Wen, and Y. B. Chen, ``Application of improved GA-BP
neural network model in property Loan prediction,'' Geomatics Inf. Sci.
Wuhan Univ., vol. 42, no. 8, pp. 11101116, 2017.
42