Project Loan Automl

LOAN PREDICTION SYSTEM USING AUTOML
PROJECT REPORT – PHASE II
Submitted by
JESVANTHINI.T Register No.: 19TD0161
MADHUVANTHI.L Register No.: 19TD0176
SHRUTI.R Register No.: 19TD0233
Under the guidance of
Mr. KUMARAKRISHNAN.S
in partial fulfillment of the requirements for the degree
of
BACHELOR OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

(An Autonomous Institution)
MADAGADIPET, PUDUCHERRY - 605107
APRIL 2023
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that the project work entitled “LOAN PREDICTION SYSTEM USING
AUTOML” is a bonafide work done by JESVANTHINI.T [19TD0161], MADHUVANTHI.L
[19TD0176], SHRUTI.R [19TD0233] in partial fulfillment of the requirement, for the award of
B.Tech Degree in Computer Science and Engineering by Pondicherry University during the
academic year 2022 - 2023.
PROJECT GUIDE HEAD OF THE DEPARTMENT
Submitted for the End Semester Practical Examination held on _____________
INTERNAL EXAMINER EXTERNAL EXAMINER

ACKNOWLEDGEMENT
We are very thankful and grateful to our beloved guide, Mr. S. KUMARAKRISHNAN
whose great support in valuable advices, suggestions and tremendous help enabled us in
completing our project. He has been a great source of inspiration to us.
We also sincerely thank our Head of the Department, Dr. K. PREMKUMAR whose
continuous encouragement and sufficient comments enabled us to complete our project report.
We thank all our Staff members who have been by our side always and helped us with our
project. We also sincerely thank all the lab technicians for their help as in the course of our project
development.
We would also like to extend our sincere gratitude and grateful thanks to our Director cum
Principal Dr. V. S. K. VENKATACHALAPATHY for having extended the Research and
Development facilities of the department.
We are grateful to our Founder Chairman Shri. N. KESAVAN. He has been a constant
source of inspiration right from the beginning.
We would like to express our faithful and grateful thanks to our Chairman and Managing
Director Shri. M. DHANASEKARAN for his support.
We would also like to thank our Vice Chairman Shri. S. V. SUGUMARAN, for providing
us with pleasant learning environment.
We would like to thank our Secretary Dr. K. NARAYANASAMY for his support.
We wish to thank our family members and friends for their constant encouragement,
constructive criticisms and suggestions that has helped us in timely completion of this project.
Last but not the least; we would like to thank the ALMIGHTY for His grace and blessings
over us throughout the project.
iii
ABSTRACT
Globally it is disconcerting how frequently banks lose money to loan borrowers as a result
of loandefault. This project is a modest attempt to put System to use in a realistic way to
determine what else it might be used for. In order to predict fraud in bank loan
administration and therefore prevent loan default, this work leverages historical loan
records based on the use of machine learning. Manual examination by a credit officer
would not have shown this fraud. The mission of the banking application for loan
approval system project is to create a tool that can atomize the concepts of loan
management. Prior to issuing a loan, banks are utilizing more sophisticated techniques to
check user information and discover actual facts about just the user.
This is being implemented in our project utilizing a multi-layer feed-forward neural

network with a backpropagation learning method, which is an artificial intelligence
technology. In order to do this, we are mining the prior records of the borrowers who
opportunities afforded in the past. Relying on these records and prior experiences, the
neural network will be trained using the backpropagation method, which produces the
most accurate results. The main goal of this research is to use an artificial neural network
to determine whether or not granting a loan to a specific person will be safe and to
estimate how accurately they would repay the loan.
In order to replace the outdated System of Civil Score, we utilized a variety of

methodologies, notably AutoML, decision trees, fraud, XG Boost, and logistic
regression. The accuracy capabilities of statistical and conventional techniques in this
direction are constrained. Credit history judgement by humans is ineffective due to the
volume and diversity of existing evidence. Case-based, analogy-based reasoning, and
statistical approaches have been used, yet they are unable to identify 21st century fraud
attempts. As an outcome, automated machine learning approaches using the decision tree
method to predict fraud have been developed, and they deliver test accuracy of over 90%.
iv
LIST OF TABLES
TABLE NO. NAME OF THE TABLE PAGE NO.
1.2 TRADITIONAL ML VS AUTOML 13

2.1 LITERATURE SURVEY 22
v
LIST OF FIGURES
FIGURE NO. NAME OF THE FIGURE PAGE NO.
1.1 LEARNING OF ML 2
1.2 INFERENCE FROM ML 3
1.3 TRADITIONAL PROGRAMMING 4
1.4 MACHINE LEARNING 4
1.5 MACHINE LEARNING ALGORITHMS 5
1.6 TRADITIONAL ML VS AUTOML 11
1.7 LOCATION RECOGNIZING 15
3.1 EXISTING SYSTEM OF PREDICTION MODEL 24
4.1 PROPOSED MODEL USING AUTOML 27

vi
LIST OF ABBREVIATIONS
ML - Machine Learning
AutoML - Automated Machine Learning
CNN - Convolutional Neural Network
CSV - Comma Separated Values

TABLE OF CONTENTS
CHAPTER TITLE PAGE
NO NO
ABSTRACT iii
LIST OF TABLES
iv
LIST OF FIGURES v
LIST OF ABBREVIATIONS vi
INTRODUCTION 1
1.
1.1 BASIC INTRODUCTION 1
1.1.1 MACHINE LEARNING 2
1.1.2 MACHINE LEARNING VS TRADITIONAL PROGRAMMING 4
1.1.3 CHALLENGES AND LIMITATIONS OF ML 5
1.1.4 MACHINE LEARNING ALGORITHMS 6
1.1.5 CHOOSING OF MACHINE LEARNING ALGORITHMS 11
1.1.6 RECENT RESEARCH MADE IN MACHINE LEARNING 11
1.1.7 EXAMPLE OF APPLICATION OF MACHINE LEARNING 11
1.2 AUTOMATED MACHINE LEARNING 12

1.2.1 TRADITIONAL MODEL VS AUTO ML 12
1.2.2 WORKING OF AUTO ML 13
1.2.3 GOAL OF AUTO ML 14
1.2.4 AUTO ML AND SUPERVISED LEARNING 14
1.2.5 POPULAR AUTO ML TOOLS 14
1.2.6 USE CASE 16
1.2.7 CHALLENGES FACED IN AUTO ML 17
1.2.8 ADVANTAGES OF AUTO ML 17
1.2.9 REAL TIME EXAMPLES 17

2. LITERATURE SURVEY 20
2.1 A SURVEY OF DATA MINING TECHNIQUES FOR

20
ANALYSING LOAN PATTERN
2.2 RISK TERRAIN MODELING: BROKERING CRIMINOLOGY

20
THEORY AND GIS METHOD FOR LOAN FORE CASTING
2.3 USING GEOGRAPHICALLY WEIGHTED REGRESSION TO

21
EXPLORE LOCAL LOAN PATTERN
2.4 SELF- ORGANISED CRITICAL HOT SPOTS OF LOAN

21
PREDICTION
2.5 LANGUAGE USAGE ON TWITTER PREDICTS LOAN RATES 22
3. EXISTING SYSTEM 23
3.1 SYSTEM ANALYSIS 23
3.2 ANALYSIS OF EXISTING SYSTEM 23
3.3 DRAWBACKS OF EXISTING SYSTEM 23
4. PROPOSED SYSTEM 25
4.1 INTRODUCTION 25
4.2 MODELING OF PROPOSED SYSTEM 26
4.3 PROPOSED ARCHITECTURE 26
4.4 MODULE DESCRIPTION 27
4.5 ADVANTAGE OF PROPOSED SYSTEM 29
5. SYSTEM REQUIREMENTS 30
5.1 HARDWARE REQUIREMENTS 30
5.2 SOFTWARE REQUIREMENTS 31

6. IMPLEMENTATION 33
6.1 IMPLEMENTATION DETAILS 33
6.2 PREDICTION MODEL 33
7. CONCLUSION
34
APPENDIX-I 35
APPENDIX-II 40
REFERENCES
41
CHAPTER 1
INTRODUCTION
LOAN PREDICTION USING AUTOML
1.1 BASIC INTRODUCTION
The main business of practically all banks is the distribution of loans. The majority
of the bank's assets were derived directly from the revenue generated by the loans that
the bank dispersed. In a banking environment, the primary objective is to put one's assets
in trustworthy hands. Many banks and financial institutions today lend money through a
comprehensive verification and validation process, but there is still no guarantee that the
chosen applicant is the most credible candidate out of all applicants.
Existing technologies that require human intelligence for the loan approval application
have a number of drawbacks:
 Does not offer a quicker or more effective system

 Doesn't produce valuable organizational coordination
 Does not provide a trustworthy way for forwarding documents from one level to
another.
 No user-friendly system
 Difficulty in producing several reports according to company requirements
This technique allows us to forecast if a certain candidate is secure or not. As a

technology that enables us to evaluate credit applications and assist lending decisions, we
are adopting deep neural networks. A multi-layer feed-forward neural network with a
backpropagation learning method is being used to implement the suggested model. A tool
for nonlinear statistical data modelling is a multilayer neural network. Often, it is applied
to simulate intricate interactions between inputs and outputs, look for patterns indata, or
represent the statistical structure of an unidentified joint probability distribution between
observed variables. There seem to be three layers in a neural network: the input layer,
sometimes known as the hidden layer or layers, and the output layer. Between the source
and the destination, hidden layers establish a processing bridge. The neuron (the
1
processing element), which has one or more inputs and only one output, and the
connections between the processing elements, which are represented by weights, are the
two most important components of the NN. The three basic stages of NN learning are
Recalling, Testing, and Learning. On historical instances of input and output variables,
the system is trained. Utilizing the proper prediction algorithm and the fresh applicant-
related data set, loan approval is predicted.
1.1.1 MACHINE LEARNING
Machine learning is the brain where all the learning takes place. The way the
machine learns is similar to the human being. Humans learn from experience. The more
we know, the more easily we can predict. By analogy, when we face an unknown
situation, the likelihood of success is lower than the known situation. Machines are
trained the same. To make an accurate prediction, the machine sees an example. When
we give the machine a similar example, it can figure out the outcome. However, like a
human, if its feed a previously unseen example, the machine has difficulties to predict.
The core objective of machine learning is the learning and inference. First of all, the
machine learns through the discovery of patterns. This discovery is made thanks to
the data. One crucial part of the data scientist is to choose carefully which data to provide
to the machine. The list of attributes used to solve a problem is called a feature
vector. You can think of a feature vector as a subset of data that is used to tackle a
problem. The machine uses some fancy algorithms to simplify the reality and transform
this discovery into a model. Therefore, the learning stage is used to describe the data and
summarize it into a model.
Figure 1.1- Learning of ML
2
For instance, the machine is trying to understand the relationship between the wage of an
individual and the likelihood to go to a fancy restaurant. It turns out the machine finds a
positive relationship between wage and going to a high-end restaurant.
Inferring
When the model is built, it is possible to test how powerful it is on never-seen-

before data. The new data are transformed into a features vector, go through the model
and give a prediction. This is all the beautiful part of machine learning. There is no need
to update the rules or train again the model. You can use the model previously trained to
make inference on new data.
The breakthrough comes with the idea that a machine can singularly learn from the data
(i.e., example) to produce accurate results. Machine learning is closely related to data
mining and Bayesian predictive modeling. The machine receives data as input and uses
an algorithm to formulate answers.
A typical machine learning tasks are to provide a recommendation. For those who have
a Netflix account, all recommendations of movies or series are based on the user's
historical data. Tech companies are using unsupervised learning to improve the user
experience with personalizing recommendation.
Figure 1.2- Inference from ML
3
1.1.2 MACHINE LEARNING vs TRADITIONAL PROGRAMMING
Traditional programming differs significantly from machine learning. In

traditional programming, a programmer code all the rules in consultation with an expert
in the industry for which software is being developed. Each rule is based on a logical
foundation; the machine will execute an output following the logical statement. When
the system grows complex, more rules need to be written. It can quickly become
unsustainable to maintain.
Traditional programming differs significantly from machine learning. In traditional

programming, a programmer code all the rules in consultation with an expert in the
industry for which software is being developed. Each rule is based on a logical
foundation; the machine will execute an output following the logical statement. When
the system grows complex, more rules need to be written. It can quickly become
unsustainable to maintain.
Figure 1.3-Traditional Programming
Machine learning is supposed to overcome this issue. The machine learns how the input
and output data are correlated and it writes a rule. The programmers do not need to write
new rules each time there is new data. The algorithms adapt in response to new data and
experiences to improve efficacy over time.
4
Figure 1.4-Machine Learning
1.1.3 CHALLENGES AND LIMITATIONS OF ML
The lack of data or the diversity in the dataset is the main problem with machine
learning. If there is no data, a machine cannot learn. A dataset with little heterogeneity
also makes the computer work harder. For a machine to get insightful knowledge,
heterogeneity is required. When there are no or few variations, it is uncommon for an
algorithm to be able to extract information. For the machine to learn, it is advised that
each group receive at least 20 observations. Owing to this restriction, evaluation and
prediction are substandard.
The following key points sum up the basic aspect of machine curriculum design:
1. Define a question
2. Collect data
3. Visualize data
4. Train the algorithm
5. Test the Algorithm
6. Collect feedback from Customer
7. Refine the trained algorithm
8. Loop 4-7 until the results are satisfying
9. Use the model to make a predictive analysis
5
As soon as the algorithm masters arriving at the correct conclusions, it applies that skill
to fresh sets of data.
Machine learning is the process by which computers figure out how to carry out
tasks without being specifically taught to do so. Computers use available data to learn in
order to do specific jobs. For straightforward jobs given to computers, it is possible to
build algorithms that instruct the device how to carry out all the steps necessary to address
the issue at hand; no learning is required on the part of the computer. It can be difficult
for a human to manually develop the required algorithms for more complex tasks. In fact,
it may prove more beneficial to aid the computer in creating its own algorithm than to
have human programmers identify each key step.
1.1.4 MACHINE LEARNING ALGORITHMS
Figure 1.5- Machine learning Algorithms
6
Machine learning can be grouped into two broad learning tasks: Supervised and
Unsupervised. There are many other algorithms;
Supervised learning
An algorithm uses training data and feedback from humans to learn the
relationship of given inputs to a given output. For instance, a practitioner can use
marketing expense and weather forecast as input data to predict the sales of cans.
You can use supervised learning when the output data is known. The algorithm will
predict new data.
There are two categories of supervised learning:
 Classification task
 Regression task
Classification
Imagine you want to predict the gender of a customer for a commercial. You will
start gathering data on the height, weight, job, salary, purchasing basket, etc. from your
customer database. You know the gender of each of your customer, it can only be male
or female. The objective of the classifier will be to assign a probability of being a male
or a female (i.e., the label) based on the information (i.e., features you have collected).
When the model learned how to recognize male or female, you can use new data to make
a prediction. For instance, you just got new information from an unknown customer, and
you want to know if it is a male or female. If the classifier predicts male = 70%, it means
the algorithm is sure at 70% that this customer is a male, and 30% it is a female.
The label can be of two or more classes. The above Machine learning example has only
two classes, but if a classifier needs to predict object, it has dozens of classes (e.g., glass,
table, shoes, etc. each object represents a class)
7
Regression
When the output is a continuous value, the task is a regression. For instance, a
financial analyst may need to forecast the value of a stock based on a range of feature
like equity, previous stock performances, macroeconomics index. The system will be
trained to estimate the price of the stocks with the lowest possible error.
Algorithm Name Description Type
Finds a way to correlate each feature to the

Linear regression Regression
output to help predict future values.
Extension of linear regression that's used

Logistic regression for classification tasks. The output variable
3is binary (e.g., only black or white) rather
Classification
than continuous (e.g., an infinite list of
potential colors)
Highly interpretable classification or

regression model that splits data-feature
values into branches at decision nodes Regression
Decision tree (e.g., if a feature is a color, each possible Classification
color becomes a new branch) until a final
decision output is made
The Bayesian method is a classification

method that makes use of the Bayesian
theorem. The theorem updates the prior Regression
Naive Bayes knowledge of an event with the Classification
independent probability of each feature
that can affect the event.
8
Support Vector Machine, or SVM, is
Support vector
typically used for the classification task.
machine Regression
SVM algorithm finds a hyperplane that
with a non-linear solver.
The algorithm is built upon a decision tree

to improve the accuracy drastically.
Random forest generates many times Regression
simple decision trees and uses the 'majority Classificat
AdaBoost vote' method to decide on which label to
return. For the classification task, the final
prediction will be the one with the most
vote; while for the regression task, the
average prediction of all the trees is the
final prediction.
Classification or regression technique that Classification

Random forest usesa multitude of models to come up with
a decision but weighs them based on their
accuracy in predicting the outcome
Unsupervised learning
In unsupervised learning, an algorithm explores input data without being given

an explicit output variable (e.g., explores customer demographic data to identify patterns)
You can use it when you do not know how to classify the data, and you want the algorithm
to find patterns and classify the data for you.
9
Algorithm Description
Type
Puts data into some groups (k) that each contains

K-means clustering data with similar characteristics (as determined by Clustering
the model, not in advance by humans)
A generalization of k-means clustering that

Gaussian mixture provides more flexibility in the size and shape of Clustering
model groups (clusters)
Splits clusters along a hierarchical tree to form a

Hierarchical classification system. Can be used for Cluster Clustering
clustering loyalty-card customer
Recommender Help to define the relevant data for making a

recommendation. Clustering
system
Mostly used to decrease the dimensionality of
PCA/T-SNE the data. The algorithms reduce the number of Dimension

featuresto 3 or 4 vectors with the highest Reduction
variances.
10
CHOOSING OF MACHINE LEARNING ALGORITHM:
There are plenty of machine learning algorithms. The choice of the algorithm is based
on the objective. In the Machine learning example below, the task is to predict thetype of
flower among the three varieties. The predictions are based on the length and thewidth of the
petal. The picture depicts the results of ten different algorithms. The pictureon the top left is
the dataset. The data is classified into three categories: red, light blue and dark blue. There
are some groupings. For instance, from the second image, everything in the upper left
belongs to the red category, in the middle part, there is a mixture of uncertainty and light blue
while the bottom corresponds to the dark category. The other images show different
algorithms and how they try to classified.
1.1.5 RECENT RESEARCHES IN MADE IN MACHINE LEARNING
Some researches that were carried out in Machine Learning are as follows;
• Text Mining and Text Classification
• Image-Based Applications
• Automated Machine Learning
• Machine Vision
• Clustering
• Optimization
• Voice Classification
• Sentiment Analysis
• Recommendation Framework Project.
• Prediction and Detection
1.1.6 EXAMPLE OF APLLICATION OF MACHINE LEARNING IN SUPPLY

CHAIN
(i) Machine learning gives terrific results for visual pattern recognition, opening
up many potential applications in physical inspection and maintenance across the entire
supply chain network.
(ii) Unsupervised learning can quickly search for comparable patterns in the
diverse dataset. In turn, the machine can perform quality inspection throughout the
logistics hub, shipment with damage and wear.
11
(iii) For instance, IBM's Watson platform can determine shipping container
damage. Watson combines visual and systems-based data to track, report and make
recommendations in real-time.
(iv) In past year stock manager relies extensively on the primary method to
evaluate and forecast the inventory. When combining big data and machine learning,
better forecasting techniques have been implemented (an improvement of 20 to 30 %
over traditional forecasting tools). In term of sales, it means an increase of 2 to 3 % due
to the potential reduction in inventory costs.
1.2 AUTOMATED MACHINE LEARNING
Automated machine learning, also referred to as automated ML or AutoML, is

the process of automating the time-consuming, iterative tasks of machine learning model
development. More specifically, it automates the selection, composition and
parameterization of ML models. It drives a considerable part of the MLOps cycle, from
data preparation to model validation and getting it ready for deployment.The core
innovation utilized in AutoML is hyperparameters search, utilized for preprocessing
components and model type selection, and for optimizing their hyperparameters.
1.2.1 TRADITIONAL MODEL VS AUTOML
The traditional method uses the processes of preprocessing, cleansing, training

etc., though the AutoML diminish the proceeding process. Machine Learning is the
process by which computer figure out how to carry out tasks without being specifically
taught thereby. Computers use available data to learn in order to do specific jobs. For the
tasks that are provided, the algorithms are designed and built to perform the work and no
learning is required on the part of the computer. In fact, it may prove more beneficialto
aid the computer creating its own algorithm than to have human programmers identify
each key step. One of the most effective approaches in machine learning for the issues is
AutoML. AutoML is a collection of tools that can automate the process of using Machine
Learning to solve problems and predictions as well.
12
Traditional Machine Learning Automated Machine Learning
Figure 1.6-Traditional Machine Learning vs AutoML
Table 1.2 Traditional ML vs AutoML
TRADITIONAL MACHINE AUTOML

LEARNING
It uses the processes of It diminish the proceeding process
preprocessing,cleansing,training etc
Human give input manually and the It is a automated process of using machine
problem is solved learning to solve the problems and predictions
as well
It is used to perform two tasks like In this all data preparation to training the
regression and classification selection model and algorithms all are done in
the automated way.
Once the model is designed the The model is developed and changes can be
changes cannot be made until it is evolved automatically where quality of data is
done manually. accessed.
It uses already designed algorithm to It can create their own algorithm than to have
perform the tasks. human programmers
13
1.2.2 WORKING OF AUTOML
This could include everything from data preparation to training to the selection of
models and algorithms — all of which is done in a completely automated way. There are
many types of machine learning, but with supervised learning, tagged input and output
data is constantly fed into human-trained systems, offering predictions with increasing
accuracy after each new data set is fed into the system.
For example, if a company wants to be able to predict whether or not somebody
is going to buy its product, they first have to have a data set of past customers, organized
by who bought and didn’t buy. Then it has to be able to use that data set to predict what
a whole new set of customers will decide to do. Or, if you want a computer to be able to
identify a cat in a video, you have to first train it by showing it other videos with cats so
it is able to accurately identify one in a video it hasn’t seen before.
1.2.3 GOAL OF AUTOML
At first blush, automated machine learning may seem a bit redundant. After all,
machine learning is already about automating the process of identifying patterns in data
to make predictions. The process, which relies on algorithms and statistical models,
doesn’t require consistent, or explicit programming. Once a machine learning model is
built, it can then be further optimized through trial and error and feedback, meaning the
machine can learn by experience and increased exposure to data — much like humans
do.The goal of autoML is to both speed up the AI development process as well as make
the technology more accessible.
In practice, much of the work required to make a machine learning model is rather
laborious, and requires data scientists to make a lot of different decisions. They have to
decide how many layers to include in neural networks, what weights to give inputs at
each node, which algorithms to use, and more. It’s a big job, and it requires a lot of
specialized skill and intuition to do it properly. The more complex the model, the more
complex the work. And some experts say automating some of that work will be necessary
as AI systems become more complex. So, autoML aims to eliminate the guesswork for
humans by taking over the decisions data scientists and researchers currently have to
make while designing their machine learning models.
14
1.2.4 AUTOML AND SUPERVISED LEARNING
Automated machine learning automates the selection of different variables in a

given data set that should be used in a model, as well as the algorithms needed to create
that model. In the case of predicting whether a person will buy or not, autoML would be
used to parse through the thousands of data points the company has on that person, and
decide what pieces of information should be used in making an accurate prediction. It
also automates the selection itself, and decides which model makes the most sense. This
could be a logistic regression model, a random forest model, some sort of ensemble
model, and so on — whatever is most applicable to the business use case.Because
autoML algorithms operate at a level of abstraction above the underlying machine
learning models, relying only on the outputs of those models as guides, they can also be
applied to pre-trained models to gain fresh insights
1.2.5 POPULAR AUTOML TOOLS
These are just a few popular choices being used among business professionals to
automate machine learning processes.
 Aible
 AutoKeras
 Auto-PyTorch
 Auto-Sklearn
 Google Cloud AutoML
AIBLE
Aible’s suite of AI solutions works to automate data science and data engineering
tasks across multiple industries. Its products can detect key data relationships, assess data
readiness for model input plus augment data analytics and recommendations. Aible
connects directly to the cloud for data security, and can be integrated with other tools like
Salesforce and Tableau.
15
AUTOKERAS
AutoKeras is an open-source library and autoML tool based on Keras, a Python
machine learning API. The tool can automate classification and regression tasks in deep
learning models for images, text and structured data. AutoKeras largely applies neural
architecture search to optimize code writing, machine learning algorithm selection and
pipeline design.
AUTO-PYTORCH
Auto-PyTorch, based from the PyTorch machine learning library in Python, allows
for fully automated deep learning (autoDL) tasks. It automates algorithm selection and
hyperparameter tuning for deep neural network architectures, and can support tabular and
time series datasets. Auto-PyTorch applies Bayesian optimization, meta-learning and
ensemble construction for automation.
AUTO-SKLEARN
Auto-Sklearn is an open-source autoML tool built on the scikit-learn machine
learning library in Python. The tool automates supervised machine learning pipeline
creation and can be used as a drop-in replacement for scikit-learn classifiers in Python.
Like Auto-PyTorch, Auto-Sklearn utilizes meta-learning, ensemble learning and Bayesian
optimization to automatically search for learning algorithms when given a new dataset.
GOOGLE CLOUD AUTOML

Google Cloud AutoML is a suite of autoML tools developed by Google that can
be used to create custom machine learning models. Leading the suite is Vertex AI, a
platform where models can be built for objectives like classification, regression, and
forecasting in image, video, text and tabular data. Vertex AI offers pre-trained APIs and
supports all open-source machine learning frameworks, including PyTorch,
TensorFlow and scikit-learn.
1.2.6 USE CASE

 Predicting customer churn.
 Building customer lifetime value models.
 Predicting equipment failures.
 Grouping like products together on an e-commerce site.
 Predicting the success of an email marketing campaign.
16
1.2.7 CHALLENGES FACED IN AUTOML
 Can’t understand the business context of the problem it is trying to solve.
 No standard for what a “good” model looks like.
 Doesn’t offer the “why” of its decision-making process.
 Too complex for non-data scientists to pull off successfully.
 Can’t automate ethics or fairness.
1.2.8 ADVANTAGES OF AUTOML

 Can discover and train multiple models at once.
 Remembers and organizes extensive data.
 Reduces time, resources and monotonous workloads.
1.2.9 REAL TIME EXAMPLE
Location Recognizing Model:

A machine learning model, trained using AutoML, was able to identify the
restaurant with 95% accuracy by looking at the image of the noodle bowl. The model can
take apartand analyse minute details of the image and is able to predict which restaurant
it was made in.
Figure 1.7 Location Recognizing
17
AutoSKLearn:
Autosklearn is an extension of the AutoWEKA usimg the python library Python
library scikit-learn which is a drop-in replacement for regular scikit-learn classifiers and
regressors. Auto-PyTorch is based on the deep learning framework PyTorch and jointly
optimizes hyperparameters and the neural architecture.
SALESFORCE
Salesforce has thousands of customers that are looking to predict a variety of

things, from customer churn to email marketing click throughs to equipment failures.
And all of this requires lots of rich data that is unique to their specific business, which
can be used to build customized machine learning models. Salesforce is focused on
making the creation of these models easy and accessible to everyone through automated
machine learning.“In order to leverage that data,” Aerni explained, “[Salesforce is] not
able to look at it.
18
CHAPTER 2
LITERATURE SURVEY
The following papers were surveyed for this project:
2.1 A survey of data mining techniques for analyzing Loan patterns
AUTHORS: U. Thongsatapornwatana
In recent years, data mining, a type of data analysis, has been utilised to examine loan
data that was previously saved from a variety of sources in order to identify patterns and
trends. Furthermore, it can be used to improve efficiency in resolving loans more quickly
and can be used to automatically notify loans. There are numerous data mining methods,
though. The right data mining techniques must be chosen in order to improve loan
detection efficiency. In particular, applications that were used to solve loans are reviewed
in this paper's study of the literature on various data mining applications. The survey
sheds light on the difficulties of loan data mining as well as research gaps.
2.2 Risk terrain modeling: Brokering criminological theory and GIS

methods for Loan forecasting
AUTHORS: J. M. Caplan, L. W. Kennedy, and J. Miller
Two main goals drive the research that is presented here. To predict shootings, risk terrain
modelling (RTM) is used as the initial step. The risk terrain maps that were created using
RTM assess the risks of upcoming shootings as they are distributed over a geography
using a variety of contextual data pertinent to the opportunity structure of shootings. The
second goal was to evaluate the risk terrain maps' capacity for forecasting over two six-
month periods and contrast it with that of retroactive hot spot maps. The findings indicate
that risk terrains are significantly more accurate than retroactive hot spot mapping at
forecasting future shootings across a variety of cut points.
19
2.3 Using geographically weighted regression to explore local Loan
patterns
AUTHORS: M. Cahill and G. Mulligan
The current study explores the spatial patterns of both Loan and its variables in a
structural model of violent Loan in Portland, Oregon. The paper presents findings from
a global ordinary least squares model, which is considered to fit for all sites within the
study area, using typical structural measures taken from an opportunity framework. Then,
as an alternative to such conventional methods of modelling Loan, geographically
weighted regression (GWR) is presented. The GWR approach estimates a local model
and generates a set of mappable parameter estimates and spatially variable t-values of
significance. It is discovered that a number of structural metrics have correlations with
Loan that differ dramatically by region. According to the results, a mixed model that
includes both fixed and spatially variable factors may produce the best realistic model of
Loan. The present investigation highlights the relevance of GWR for addressing
misspecification of a global model of urban Ratio and evaluating local dynamics that
increase Loan levels.
2.4 Self-organised critical hot spots of Loan Prediction
AUTHORS: H. Berestycki and J.-P. Nadal
The spatio-temporal dynamics of Loan activity are described by a set of models that we
introduce in this research. Here it is claimed that one can see the development of hot spots
using a basic set of procedures that relate to components essential to the study of Loan.
By examining the most basic iterations of our model, we demonstrate a self- organised
critical condition of illicit activity that, depending on the situation, we proposeto refer to
as a warm spot or a tepid milieu2. In contrast to true hot spots where localized high level
or peaks are being generated, it is characterized by a positive level of unlawfulor uncivil
activity that sustains itself without exploding. Additionally, we offer modifications to our
model that account for local and long-range interactions, the impacts of repeated
victimization, and briefly address some of the outcomes, such as hysteresis events.
20
2.5 Language usage on Twitter predicts Loan rates
AUTHORS: A. Almehmadi, Z. Joudaki, and R. Jalali
Social networks 1 generate a vast amount of data. Over 230 million active users
contribute more than 500 million tweets daily to the microblogging network Twitter. We
suggest using Twitter's open data for analysis to forecast loan rates. In recent years, loan
rates have gone up. Even while loan stoppers use a variety of techniques to lower loan
rates, none of the earlier strategies focused on using the language used in forms as a
source of data to forecast loan rates. In this study, we propose that a reliable method for
forecasting loan rates in cities is language analysis of tweets. Three months' worth of
tweets in Houston and New York City were gathered by locking the collection by
geographic longitude and latitude.
1. ANALYSING LOAN PATTERN
Data mining is the process of reviewing large amounts of previously stored data in order
to discover patterns and trends in it. Furthermore, it can be used to increase efficiency in
clearing loans more rapidly and can be used to automatically notify loans.
2. OVERALL BROKERING CRIMINOLOGY
Two main goals drive the research that is presented here. To predict shootings, risk terrain
mapping (RTM) is used as the initial step. The risk terrain maps that were created using
RTM assess the risks of upcoming shootings as they are distributed more than a
geography using a variety of contextual data pertaining to the opportunity structure of
shootings. The second goal was to evaluate the risk terrain maps' capacity for forecasting
over two six-month periods and contrasting it with that of retrospectively hot spot maps.
The relevant papers referred for this project have been also presented in table 2.1
21
S. AUTHORS TITLE YEAR PROPOSEDSYSTEM ADDITIONAL
NO DATA
1. Thongsa A survey of The literatures on various The data mining
Tapornwatana data mining data mining applications, for finding the
techniques especially applications that patterns and
for applied to solve the Loans. trends in Loan
analyzing 2021 Survey also throws light on to beused
Loan research gaps and challenges appropriately
patterns of Loan data mining. and to be a help
for beginners in
the research of
Loan data
mining.
2. H. Berestycki Self- Although Loan stoppers are Support Vector

and organised utilizing various technics to Machine (SVM)
J.-P. Nadal critical hot reduce Loan rates, none of classifier to
spots of 2020 the previous approaches create a model
Loan targeted utilizing the of prediction of
Prediction language usage in Forms as Loan rates
a source of information to based on tweets.
predict Loan rates.
Table 2.1 Literature Survey
22
CHAPTER 3
EXISTING SYSTEM
3.1 SYSTEM ANALYSIS

System analysis is the examination of a business problem domain to identify
improvements, define the business needs and priorities for the solution, and make
improvement recommendations. Analyzing and comprehending a problem, discovering
potential solutions, and selecting the best one deciding on the best line of action before
creating the final design. It entails figuring out how current systems operate and the
issues related to them. It is important to keep in mind that before a new system can be
built, it must first be studied to determine whether it can be improved upon or replaced.
A system or a component of it is studied through system analysis in order to determine
the system's goals. What the system should do is defined through system analysis. It
entails gathering and analysing data constructing the system's logical model and using
an already implemented solution.
The System Development Life Cycle was chosen as the research methodology for this
investigation. The SDLC method that was utilised for software development is called the
waterfall model. In the waterfall model method, the development process is broken down
into discrete steps. One phase's output serves as the sequential input for the following
phase.
3.2 ANALYSIS OF EXISTING SYSTEM
The current loan process at Staff Multipurpose Cooperative Society Limited

JOSTUM involves manual decision-making and requires the applicant to personally meet
with cooperative society administration. To determine if the loan will be approved or
denied, the applicant may need to visit the cooperative society office.
3.3 DRAWBACKS OF EXISTING SYSTEM
The current loan process at Staff Multipurpose Cooperative Society Limited JOSTUM
involves manual decision-making and requires the applicant to personally meet with
cooperative society administration. To determine if the loan will be approved or denied,
the applicant may need to visit the cooperative society office.
23
Figure 3.1-Existing System of Prediction Model
24
CHAPTER 4
PROPOSED SYSTEM
4.1 INTRODUCTION
The model that is proposed gives us more accuracy than the traditional model. The
proposed model can compare the different models and algorithms simultaneously and
provides us the higher accuracy than the single model or single algorithm execution. The
dataset are collected using kaggle and then the collected datasets are trained and tested for
providing the prediction accuracy.Comparitively the accuracy of the AutoML prediction is
higher than the other methods or models.The traditional method uses the processes of
preprocessing,cleansing,training etc., though the AutoML diminish the proceeding process.
Machine Learning is the process by which computer figure out how to carry out tasks
without being specifically taught thereby. Computers use available data to learn in order to
do specific jobs. For the tasks that are provided , the algorithms are designed and built to
perform the work and no learning is required on the part of the computer. In fact, it may
prove more beneficial to aid the computer creating its own algorithm than to have human
programmers identify each key step.One of the most effective approaches in machine
learning for the issues is AutoML. AutoML is a collection of tools that can automate the
process of using Machine Learning to solve problems and predictions as well.
ALGORITHM
ALGORITHM FOR PREDICTION OF LOAN DEFAULTERS USING

AUTOML
INPUT : DATASET
OUTPUT : PREDICTION RESULT
Begin
Step 1: Import the CSV File
Step 2: Apply preprocessing to remove duplication and Null Data
Step 3: Perform Random feature selection and fix the hyperparameters
Step 4: The model is trained using AutoML
Step 5: Performing hyperparameter optimization to achieve maximum performance in
minimum time
Step 6: Evaluate the predicted output with other models
Step 7: End
25
4.2 MODELLING OF PROPOSED SYSTEM
System modelling is the process of developing abstract models of a system, with

each model presenting a different perspective of the system. It is all about representing
a system using graphical notation. Models help the analyst to understand the
functionality of the system.
The AutoML Modeling Language (AML) is a general-purpose developmental modeling
language in the field of software engineering, which is intended to provide a standard
way to visualize the design of a system. It can be used to model the structures of an
application, behaviors and even business processes. The central idea behind the usage
of UML in this research is to capture the significant details about the system, such that
the problem will be clearly understood, solution architecture can be developed, and a
chosen implementation scheme can be clearly identified.
4.3 PROPOSED ARCHITECTURE
System architecture is the conceptual model that defines the structure, behaviour
and representation of a system. The architecture of the system is shown in the figure
below:
The architecture shows that the dataset is first preprocessed which involve transforming
raw data into an understandable format and check if there are missing values. Processing
the dataset is done to remove rows or columns that have missing values due to mistakes
the might have occurred when entering the data into the CSV file. This is important as
it helps prevent some runtime errors like Not a Number (NaN) error that could prevent
the system from working effectively. The dataset is then normalized which involves
rescaling real valued numeric attributes into the range 0 and 1. The dataset is then
divided into training and testing datasets.
26
Figure 4.1-Proposed Model using AutoML
4.4 MODULES DESCSRIPTION:
FACT FINDING
Fact finding is process of collection of data and information based on techniques which
contain sampling of existing documents, research, observation, questionnaires,
interviews, prototyping and joint requirements planning. System analyst uses suitable
fact finding techniques to develop and implement the current existing system. Collecting
required facts are very important to apply tools in System Development Life Cycle
because tools cannot be used efficiently and effectively without proper extracting from
facts. Fact finding techniques are used in the early stage of System Development Life
Cycle including system analysis phase, design and post implementation review. The
following techniques are to be executed:
The dataset has thirteen (13) attributes, twelve (12) of which are feature attributes and
one (1) is the predicted value.
Samples of 614 loan applicant information were collected to train the model
3.The data is available in a CSV (Comma Separated Values) file that can easily be
loaded into the system for training the model.
27
Data Collection:
This is the first real step towards the real development of a machine learning
model, collecting data. This is a critical step that will cascade in how good the model
will be, the more and better data that we get, the better our model will perform.There
are several techniques to collect the data, like web scraping, manual interventionsand etc.
Comparison of Machine Learning Algorithms for Predicting Loan aken from kaggle
and some other source.
Dataset:
The dataset consists of 821 individual data. There are 27 columns in the
dataset, which are described below.
STATE: State in India
DISTRICT: District in the state of India.
Year: 2001-2021
Total Loan: Total number of total Loan rate
Data Preparation:
We will transform the data. By getting rid of missing data and removing some
columns. First we will create a list of column names that we want to keep or retain.
Next, we drop or remove all columns except for the columns that we want to retain.
Model Selection:
While creating a machine learning model, we need two dataset, one for training
and other for testing. But now we have only one. So lets split this in two with a ratio of
80:20. We will also divide the dataframe into feature column and label column.
We will use AutoML, which fits multiple decision tree to the data. Finally I train the
model by passing train_x, train_y to the fit method.
Once the model is trained, we need to Test the model. For that we will pass test_x to the
predict method.
28
Analyze and Prediction:
In the actual dataset, we chose only 3 features:

STATE: State in India
DISTRICT: District in the state of India.
Year: 2001-2021
Prediction:
1. Total number of total Loan rate
Saving the Trained Model:
Once you’re confident enough to take your trained and tested model into the
production-ready environment, the first step is to save it into a .h5 or . pkl file using a
library like pickle.Make sure you have pickle installed in your environment. Next, let’s
import the module and dump the model into . pkl file.
4.5 ADVANTAGES OF PROPOSED SYSTEM
The main goal of this project is to decide whether or not a loan applicant is eligible
for a loan based on the numerous qualities that the user provides as input. The
Machine Learning Model is given these features, and it generates a forecast based
on how they affect the label. This was accomplished by first looking for a dataset
that met both the developer's and the user's requirements. The interest rate on a loan
may rise year after year, necessitating the development of a system that can predict
the types of loans available and their rates. This technique can assist the cooperative
society in determining which client categories are eligible for a certain loan. By
predicting loan, The cooperative society can lower its non-performing assets by
anticipating loan performance.
29
CHAPTER 5
SYSTEM REQUIREMENTS
5.1 HARDWARE REQUIREMENTS:
 System 2.4GHz : Pentium IV

 Hard Disk : 40 GB
 Floppy Drive : 1.44Mb
 Monitor Colour : 15 VGA
 Mouse : Logitech
 Ram : 512 Mb
5.2 SOFTWARE REQUIREMENTS:
 Operating system : Windows.

 Coding Language : Python
 Database : MYSQL
5.1 HARDWARE REQUIREMENTS
5.1.1 Pentium 4
Is a series of single-core CPUs for desktops, laptops and entry-level servers
manufactured by Intel. The processors were shipped from November 20, 2000 until
August 8, 2008.[3][4] The production of Netburst processors was active from 2000 until
May 21, 2010.[3][4]
All Pentium 4 CPUs are based on the NetBurst microarchitecture. The Pentium 4
Willamette (180 nm) introduced SSE2, while the Prescott (90 nm) introduced SSE3.
Later versions introduced Hyper-Threading Technology (HTT).
The first Pentium 4-branded processor to implement 64-bit was the Prescott (90
nm) (February 2004), but this feature was not enabled. Intel subsequently began selling
64-bit Pentium 4s using the "E0" revision of the Prescotts, being sold on the OEM market
as the Pentium 4, model F. The E0 revision also adds eXecute Disable (XD) (Intel's name
for the NX bit) to Intel 64. Intel's official launch of Intel 64 (under the name EM64T at
that time) in mainstream desktop processors was the N0 stepping Prescott-2M.
Intel also marketed a version of their low-end Celeron processors based on the
NetBurst microarchitecture (often referred to as Celeron 4), and a high-end derivative,
Xeon, intended for multi-socket servers and workstations.
30
5.2 SOFTWARE REQUIREMENTS
5.2.1 Windows:
Microsoft Windows commonly known as Windows, was introduced by Microsoft on 20
November, 1985. Windows is proprietary and closed source operating system. Windows
uses Graphical User Interface (GUI) to interact with users. Bill Gates and Paul Allen
founded Microsoft and Windows operating system. Windows is supported on almost
every computer platform ARM, ARM-64, x86-64, IA-32 as it is most widely used
operating system and captures 90% of total share in personal computer (PC) market.
Journey of Microsoft Windows from first version to latest version:
1. Windows 1.0 – In Nov 1985

2. Windows 2.0 – In Dec 1987
3. Windows 3.0 – In May 1990
4. Windows 95 – In Aug 1995
5. Windows 98 – In June 1998
6. Windows ME – In Sep 2000
7. Windows XP – In Oct 2001
8. Windows Vista – In Nov 2006
9. Windows 7 – In July 2009
10. Windows 8.0 – In Oct 2012
11. Windows 8.1 – In Oct 2013
12. Windows 10 – In July 2015
5.2.2 Python:
Python is a general purpose, dynamic, high-level, and interpreted programming language.
It supports Object Oriented programming approach to develop applications. It is simple
and easy to learn and provides lots of high-level data structures. It is easy to learn yet
powerful and versatile scripting language, which makes it attractive for Application
Development.
Python's syntax and dynamic typing with its interpreted nature make it an ideal language
for scripting and rapid application development.
Python supports multiple programming pattern, including object-oriented, imperative,
and functional or procedural programming styles.
Python is not intended to work in a particular area, such as web programming. That is
why it is known as multipurpose programming language because it can be used with web,
enterprise, 3D CAD, etc.
31
We don't need to use data types to declare variable because it is dynamically typed so we
can write a=10 to assign an integer value in an integer variable.
Python makes the development and debugging fast because there is no compilation step
included in Python development, and edit-test-debug cycle is very fast.
5.2.3 MySQL:
Structured Query Language (SQL), which is a computer language for storing,
manipulating, and retrieving data stored in relational database management systems
(RDBMS). SQL was developed at IBM by Donald Chamberlin , Donald C. Messerli ,
and Raymond F. Boyce in the year 1970s.
MySQL is an open-source Relational Database Management System that stores data in a
structured format using rows and columns. MYSQL language is easy to use as compared
to other programming language like C, C++, Java, etc. By learning some basic commands
we can work, create and interact with the Database.
32
CHAPTER 6
IMPLEMENTATION
6.1 IMPLEMENTATION DETAILS:
Our Base paper consists of process of predicting the loan and prevent loan defaulters
using historical loan records.We have included the processing modules which use automl
where it predicts the best algorithm and predicts whether to give the loan or not through
website.
6.2 PREDICTION MODEL:
The dataset which is collected are used to train the model .The model which we developed
for used to predict the loan and prevent the loan defaulters. Using automl,the best
algorithm is choosen .That algorithm is used to predict whether the person is eligibleto get
the loan or not. We have developed the website where we can give the live data ofa person
and view the results.The test accuracy is over 90%.
33
CHAPTER 7
CONCLUSION
With the aid of machine learning technologies, the proposed model, finding
connections and patterns between disparate data is now simple. The major task of this
project is to determine the sort of loan that might occur given the place at which it has
already happened. Using a training set of data that has undergone data cleansing and data
transformation, we have developed a model using the machine learning idea. The model
accurately predicts the type of loan. Analyzing a data set is made easier by data
visualisation. The graphs include bar, pie, line, and scatter diagrams, each with their
unique features. We created a large number of graphs and discovered some unique
statistics that aided in studying Indian Loans datasets that can assist us in determining the
components that contribute to a safe society.
34
APPENDICES
APPENDIX – I
Web Application:
The web application contains several modules where the predicted values and the
current values can be viewed and the result is also given. The datasets are trained by itself
using the AutoML feature included.
Module 1:
This module is the Homepage created for the web application. One has to login with
their credentials to check whether they will be provided or not.
Module 2:
In this module, the user logs in with his/her username and password.
35
Only with the correct credentials, the user will be able to login into the system.
Module 3:
Once the user has logged in with the correct credentials, An .csv file containing the
datasets should be uploaded to train itself. In this module, The AutoML feature is used
to instead of manually training the data.
36
Module 4:
After the file is uploaded, a preview is of the data are viewed upon the screen .
The data present are 614, which can also be modified to include furthermore data. Any
missing values or raw data are corrected by the AutoML and does not require any
manual pre-processing.
The user must click on the “Click to Train\ Test” button to train the datasets by itself
.This process might take a few seconds. Once the training is completed , A pop-up
message arrives saying “Training finished”.
37
Module 5:
This module takes the necessary information from the user such as;
 Name of The Applicant
 Age of the Applicant
 Gender
 Marital Status
 Dependents
 Education Status
 Credit History
 Employment Status
 Income of the Applicant and Co-Applicant If dependents are > 0.
 Loan Amount
 Duration of the Loan
 Region where the Applicant resides
 Selection of best model to make the prediction
38
Once, after the Applicant has filled all the necessary details and clicked the SUBMIT
button,The Result is displayed.
Module 6:
In this Analysis module, The accuracy plot of the dataset are displayed.
39
APPENDIX -II
PUBLICATION:
40
REFERENCES
[1]D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den

Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot
et al., “Mastering the game of Go with deep neural networks and tree
search,” Nature, vol. 529, no. 7587, p. 484, 2016.
[2]G. Song, L. Xiao, S. Zhou, D. Long, S. Zhou, and K. Liu, `Ìmpact of

residents' routine activities on the spatial-temporal pattern of theft from
person,'' Acta Geography Sinica, vol. 72, no. 2, pp. 356367, 2017.
[3]A. Babakura, M. N. Sulaiman, and M. A. Yusuf, `Ìmproved method of

classication algorithms for Loan prediction,'' in Proc. Int. Symp. Biometrics
Secur. Technol. (ISBAST), 2015, pp. 250255.
[4]H. Tyralis and G. Papacharalampous, ``Variable selection in time series

forecasting using random forests,'' Algorithms, vol. 10, no. 4, p. 114, Oct.
2017.
[5]Z. Jun and H. Wenbo, ``Recent advances in Bayesian machine learning,'' J.

Comput. Res. Develop., vol. 52, no. 1, pp. 1626, 2015.
[6]Y. Yang, J. Dong, X. Sun, E. Lima, Q. Mu, and X.Wang, `À CFCC-LSTM
model for sea surface temperature prediction,'' IEEE Geosci. Remote Sens.
Lett., vol. 15, no. 2, pp. 207211, Feb. 2018.
.
[7]X. Hong, R. Lin, C. Yang, N. Zeng, C. Cai, J. Gou, and J. Yang, ``Predicting
Alzheimer's disease using LSTM,'' IEEE Access, vol. 7, pp. 8089380901,
2019.
[8]E. Brochu, V. M. Cora, and N. De Freitas, “A tutorial on Bayesian

optimization of expensive cost functions, with application to active user
modeling and hierarchical reinforcement learning,” CoRR, vol.
abs/1012.2599, 2010.
[9] G. R. Nitta, B. Y. Rao, T. Sravani, N. Ramakrishiah, and M. BalaAnand,

``LASSO-based feature selection and Naïve Bayes classier for Loan
prediction and its type,'' Serv. Oriented Comput. Appl., vol. 13, no. 3, pp.
187197, 2019.
[10] H. Berestycki and J.-P. Nadal, ``Self-organised critical hot spots of

criminal activity,'' Eur. J. Appl. Math., vol. 21, nos. 45, pp. 371399, Oct.
2010.
41
[11] W. H. Li, L.Wen, and Y. B. Chen, `Àpplication of improved GA-BP
neural network model in property Loan prediction,'' Geomatics Inf. Sci.
Wuhan Univ., vol. 42, no. 8, pp. 11101116, 2017.
[12] L. Lin,W. J. Liu, andW.W. Liao, ``Comparison of random forest

algorithm and space-time kernel density mapping for Loan hotspot
prediction,'' Prog. Geogr., vol. 37, no. 6, pp. 761771, 2018.
[13] C. L. X. Liu, S. H. Zhou, and C. Jiang, ``Spatial heterogeneity of

microspatial factors' effects on street robberies: A case study of DP
Peninsula,'' Geograph. Res., vol. 36, no. 12, pp. 24922504, 2017.
[14] G. R. Nitta, B. Y. Rao, T. Sravani, N. Ramakrishiah, and M. BalaAnand,

``LASSO-based feature selection and Naïve Bayes classier for Loan
prediction and its type,'' Serv. Oriented Comput. Appl., vol. 13, no. 3, pp.
187197, 2019.
[15] L. Lin, J. Jiakai, S. Guangwen, L. Weiwei, Y. Hongjie1, and L.

Wenjuan, ``Hotspot prediction of public property Loan based on spatial
differentiation of Loan and built environment,'' J. Geo-Inf. Sci., vol. 21,
no. 11, pp. 16551668, 2019.
42

Project Loan Automl

Uploaded by

Copyright:

Available Formats

Project Loan Automl

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Loan Automl

Uploaded by

Copyright:

Available Formats

LOAN PREDICTION SYSTEM USING AUTOML

PROJECT REPORT – PHASE II

JESVANTHINI.T Register No.: 19TD0161

MADHUVANTHI.L Register No.: 19TD0176

SHRUTI.R Register No.: 19TD0233

Under the guidance of

in partial fulfillment of the requirements for the degree

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SRI MANAKULA VINAYAGAR ENGINEERING COLLEGE

MADAGADIPET, PUDUCHERRY - 605107

AUTOML” is a bonafide work done by JESVANTHINI.T [19TD0161], MADHUVANTHI.L

academic year 2022 - 2023.

PROJECT GUIDE HEAD OF THE DEPARTMENT

Submitted for the End Semester Practical Examination held on _____________

INTERNAL EXAMINER EXTERNAL EXAMINER

This is being implemented in our project utilizing a multi-layer feed-forward neural

In order to replace the outdated System of Civil Score, we utilized a variety of

TABLE NO. NAME OF THE TABLE PAGE NO.

1.2 TRADITIONAL ML VS AUTOML 13

FIGURE NO. NAME OF THE FIGURE PAGE NO.

1.2 INFERENCE FROM ML 3

1.3 TRADITIONAL PROGRAMMING 4

1.4 MACHINE LEARNING 4

1.5 MACHINE LEARNING ALGORITHMS 5

1.6 TRADITIONAL ML VS AUTOML 11

1.7 LOCATION RECOGNIZING 15

3.1 EXISTING SYSTEM OF PREDICTION MODEL 24

4.1 PROPOSED MODEL USING AUTOML 27

AutoML - Automated Machine Learning

CNN - Convolutional Neural Network

CSV - Comma Separated Values

1.1.1 MACHINE LEARNING 2

1.1.2 MACHINE LEARNING VS TRADITIONAL PROGRAMMING 4

1.1.3 CHALLENGES AND LIMITATIONS OF ML 5

1.1.4 MACHINE LEARNING ALGORITHMS 6

1.1.5 CHOOSING OF MACHINE LEARNING ALGORITHMS 11

1.1.6 RECENT RESEARCH MADE IN MACHINE LEARNING 11

1.1.7 EXAMPLE OF APPLICATION OF MACHINE LEARNING 11

1.2 AUTOMATED MACHINE LEARNING 12

1.2.4 AUTO ML AND SUPERVISED LEARNING 14

1.2.5 POPULAR AUTO ML TOOLS 14

1.2.6 USE CASE 16

1.2.7 CHALLENGES FACED IN AUTO ML 17

1.2.8 ADVANTAGES OF AUTO ML 17

1.2.9 REAL TIME EXAMPLES 17

2.1 A SURVEY OF DATA MINING TECHNIQUES FOR

2.2 RISK TERRAIN MODELING: BROKERING CRIMINOLOGY

2.3 USING GEOGRAPHICALLY WEIGHTED REGRESSION TO

2.4 SELF- ORGANISED CRITICAL HOT SPOTS OF LOAN

2.5 LANGUAGE USAGE ON TWITTER PREDICTS LOAN RATES 22

3.1 SYSTEM ANALYSIS 23

3.2 ANALYSIS OF EXISTING SYSTEM 23

3.3 DRAWBACKS OF EXISTING SYSTEM 23

4.2 MODELING OF PROPOSED SYSTEM 26

4.3 PROPOSED ARCHITECTURE 26

4.4 MODULE DESCRIPTION 27

4.5 ADVANTAGE OF PROPOSED SYSTEM 29

5.1 HARDWARE REQUIREMENTS 30

5.2 SOFTWARE REQUIREMENTS 31

6.1 IMPLEMENTATION DETAILS 33