Analysis of Vehicle Insurance Data To de
Analysis of Vehicle Insurance Data To de
Analysis of Vehicle Insurance Data To de
https://doi.org/10.22214/ijraset.2020.30734
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.429
Volume 8 Issue VII July 2020- Available at www.ijraset.com
Abstract: There are more than thousands of insurance companies in the world which handles large amount of data. Insurance
fraud has become the most common among the organizations across the industries. Insurance industry is one among the most
growing industries hence fraud detection becomes very important in this current world. Fraud detection can be implemented in
various fields like banking, insurance, financial sectors and information security systems. There are many approaches by which
fraud can be detected. In this paper, we use various techniques of machine learning for the detection of vehicle insurance fraud.
We make use of Random forest and KNN algorithm for the accuracy detection the insurance fraud. The performance
calculation is done by calculation of confusion matrix.
Index Terms: Fraud, Fraud detection, Machine Learning, Insurance, KNN, Random forest, Confusion Matrix.
I. INTRODUCTION
Fraud is a criminal misrepresentation which causes hindrance for both the individual and the organization. [1] The insurance
industries are now implementing various techniques for the effective management of fraud. There are two major types of fraud, hard
insurance fraud and the soft insurance fraud. When few people intentionally fake an accident then that type of fraud is known as the
hard insurance fraud. And When the person has insurance claim that is valid but falsifies the part of the claim is known as soft
insurance fraud. So the organisations should implement various techniques that are helpful for the fraud detection to increase the
customer satisfaction. [4]
When the number of undetected fraud cases increases then the premium amount also increases to compensate the losses, which
in turn affects the insured parties. [12] With the increase in the number of fraud cases, we can detect the frauds by implementing
various techniques with the help of the data obtained from many other similar cases. Searches should be possible by using data
innovation as an answer for discover an example and afterward recognize misrepresentation that happens dependent on the
information of vehicle protection throughout the years. [5]
Insurance fraud detection is a difficult issue, given the assortment of misrepresentation designs and moderately small proportion of
known frauds in regular examples. While building identification models, the investment funds from misfortune anticipation should
be offset with cost of false alarms. Various machine learning methods consider improving prescient precision, empowering
misfortune control units to accomplish higher inclusion with low false positive rates. [8] In this paper, numerous machine learning
strategies for misrepresentation location are introduced and their exhibition on different data collection sets are analysed. The
feature engineering impact, parameter tweaking and feature selection are investigated with the objective of obtaining prevalent
prescient execution. [11]
Besides, the extortion specialists may confront numerous unfavourable circumstances while detecting the vehicle protection
misrepresentation cases for the most part happen because of two reasons. Firstly, any absent or wrong case data makes the extortion
identification testing challenging. Furthermore, it is additionally discovered that the quantity of noxious cases is significantly less
than the absolute cases submitted. This uneven dispersion (information imbalance) prompts progressively troublesome extortion
recognition. Moreover, the majority of the supervised classifiers create inefficient classification models with unequal information,
since they tend to order all the information focuses as certifiable class (significant class tests) and overlook the deceitful focuses
(minority class tests) [3].
Insurance frauds spread the scope of inappropriate exercises which an individual may submit so as to accomplish a good result from
the insurance company. This could run from organizing the occurrence, distorting the circumstance including the applicable on-
screen characters and the reason for incident and lastly the degree of damage caused. In this paper we use various techniques of
machine learning from the detection of fraudulent cases. The feature selection algorithm is used for the selecting the most target
variables. KNN and Random Forest are the other algorithms used for the fraud detection, hence they provide the accurate results for
the given dataset.
III. IMPLEMENTATION
In this proposed system we deal with the vehicle insurance fraud detection using machine learning algorithms. The auto insurance
fraud is the most eminent insurance fraud among the various other types of frauds. The fraud users claim insurance using fake
accident reports. Hence fraud detection becomes an important task for the organizations to avoid losses. In this system, we focus on
detection of auto insurance fraud by using, various machine learning technique.
Figure 1 represents the architecture of the proposed system for the auto insurance fraud detection which shows how the user and
admin works. To build up a more profound comprehension, it merits experiencing the overall work process of the AI procedure, the
procedure comprises of 5 phases:
1) Information Intake: Right away, the dataset is stacked from the record and is spared in memory.
2) Information Change: At this point, the information that was stacked at stage 1 is changed, cleared, and standardized to be
reasonable for the calculation. Information is changed over so it lies in a similar range, has a similar organization, and so forth.
Now highlight extraction and choice, which are examined further, are proceeded too. Notwithstanding that, the information is
isolated into sets – 'preparing set' and 'test set'. Information from the preparation set is utilized to construct the model, which is
later assessed utilizing the test set.
3) Model Training: At this stage, a model is assembled utilizing the chose calculation.
4) Model Testing: The model that was constructed or prepared during stage 3 is tried utilizing the test informational collection, and
the delivered outcome is utilized for building another model, that would think about past Models, for example "learn" from
them.
5) Model Deployment: At this stage, the best model is chosen (either after the characterized number of cycle or when the required
outcome is accomplished). The Figure 2 speaks to the overall work process procedure of the 5 phases: Data consumption, Data
change, Model preparing, Model testing, Model deployment.
B. K-Nearest Neighbours
K-Nearest Neighbours (KNN) is one of the most least difficult, precise AI calculations. KNN is a non-parametric computation,
suggesting that it doesn't make any assumptions about the data structure. In genuine issues, data now and again consents to the
overall speculative notions, making non-parametric counts a conventional response for such issues. KNN model depiction is as
direct as the dataset – there is no learning required, the entire getting ready set is taken care of. KNN can be used for both
classification and regression problems. In the two issues, the estimate relies upon the k getting ready events that are closest to the
information model. In the KNN classification issue, the yield would be a class, to which the data model has a spot, foreseen by the
predominant part vote of the k closest neighbours. In the regression problem, the yield would be the property estimation, which is
regularly a mean estimation of the k nearest neighbours. The nearest neighbour division can be resolved using Euclidean Distance:
C. Random Forest
Random Forest is one of the most known machine learning algorithm. It requires basically no information planning and
demonstrating or modelling yet for the most part brings about precise outcomes. Random Forests depend on the decision trees
portrayed in the past segment. All the more explicitly, Random Forests are the assortments of decision trees, creating a superior
forecast exactness. That is the reason it is known as a ‘forest’ – it is essentially a lot of decision trees. The fundamental thought is to
develop various decision trees dependent on the autonomous subsets of the dataset. At every node, n factors out of the list of
capabilities are chosen arbitrarily, and the best split on these factors is found. In this venture, among KNN and random forest
calculations, whichever calculation gives an increasingly precise outcome will be utilized to foresee the extortion in the protection
information. The calculation for Random forest is
V. RESULTS
As fraud poses a serious problem in the current society, it has to be resolved. In order to resolve these problems, we can build
systems which predict fraud in the data given. These systems are built using various machine learning techniques like naïve Bayes,
KNN, random forest, neural networks. In this paper we have discussed about various ML techniques and how it is implemented in
the systems and how accurate it is in predicting the fraud. Later these techniques are compared using five criteria from different
perspectives.
In Random Forest, training data is chosen randomly. Each trained tree gives its own classification result so this analyzes the missing
data and calculates the errors. KNN algorithm stores the data for further classification instead of making calculations of the data.
Among KNN and random forest algorithms, whichever algorithm gives a more accurate result will be used to predict the fraud in
the insurance data. The system identifies whether the claim is fraud or not by considering the information given by users. Using
Random Forest and KNN algorithms insurance fraud is predicted accurately. The information obtained from the user is evaluated
with the dataset and tells the users whether the claim is accepted or rejected. For classification problems while using KNN algorithm
with an even number of classes, it is advised to choose an odd k since this will eliminate the possibility of a tie during the majority
vote. The Figure 3 shows the comparison graph for the algorithms KNN and Random Forest:
Figure 3: Comparison results of the algorithms KNN and Random Forest on the dataset
The drawback of the KNN algorithm is the bad performance on the unevenly distributed datasets. Thus, if one class vastly
dominates the other ones, it is more likely to have more neighbours of that class due to their large number, and, therefore, make
incorrect predictions.
REFERENCES
[1] A Predictive Modelling for Detecting Fraudulent Automobile Insurance Claims, Hojin Moon, Yuan Pu, Cesarina Ceglia,Journal of Theoretical Economics
Letters, 2019, 9, 1886-1900
[2] Abnormal Pattern Prediction: Detecting Fraudulent Insurance Property Claims with Semi-Supervised Machine-Learning, Sebastián M. Palacio, Data science
journal, 2019
[3] Detection of Automobile Insurance Fraud Using Feature Selection and Data Mining Techniques, Sharmila Subudhi, International Journal of Rough Sets and
Data Analysis · July 2018
[4] Fraud Detection in Automobile Insurance using a Data Mining Based Approach, Ali Ghorbani and Sara Farzai, International journal of Mechatronics, Electrical
and Computer technology, Vol. 8(27), Jan. 2018, PP. 3764-3771
[5] Nearest Neighbour and Statistics Method based for Detecting Fraud in Auto Insurance, TessyBadriyah, LailulRahmaniah, IwanSyarif, 2018 IEEE, 978-1-5386-
8066-7/18
[6] CoDetect: Financial Fraud Detection with Anomaly Feature Detection, Dongxu,DejunMu, LibinYang and Xiaoyan Cai, 2169-3536 2018 IEEE.
[7] Cost-sensitive Parallel Learning Framework for Insurance Intelligence Operation, Xinxin Jiang, Shirui Pan, Member, IEEE, Guodong Long, Fei Xiong, Jing
Jiang, and Chengqi Zhang, IEEE transactions on industrial electronics, 2018
[8] A State-of-the-Art Review of Machine Learning Techniques for Fraud Detection Research, Sinayobye Janvier Omar, Kiwanuka Fred, Kaawaase Kyanda, 2018
ACM/IEEE Symposium on Software Engineering in Africa
[9] Auto insurance fraud detection using unsupervised spectral ranking for anomaly, KeNian, Haofan Zhang, Aditya Tayal, Thomas Coleman, Yuying Li, The
Journal of Finance and Data Science 2 (2016) 58e75
[10] Fraud detection system, Aisha Abdallah, MohdAizainiMaarof, Anazida Zainal, Journal ofNetworkandComputerApplications68(2016)90–113
[11] Detecting Auto Insurance Fraud by Data Mining Techniques, Rekha Bhowmik, Journal of Emerging Trends in Computing and Information Sciences, Volume 2
No.4, April 2011
[12] Journal of computer and network Applications, Fraud detection system: A Survey, Aisha Abdallah, Mohd Aizaini Maarof, Anazida Zainal, 68 (2016) 90-113.
[13] Strategies for detecting fraudulent claims in the automobile insurance industry, Stijn Viaene, Mercedes sAyuso, Montserrat Guillen, Dirk Van Gheel, Guido
Dedene, European Journal of Operational Research 176 (2007) 565–583ss
[14] Claims Auditing in automobile insurance: fraud detection and deterrence objectives, Sharon Tennyson, Pau Salsas-Forn, The Journal of Risk and Insurance,
2002, Vol. 69, No. 3, 289-3