Authors:
Quentin Telnoff
1
;
2
;
Emanuela Boros
1
;
Mickael Coustaty
1
;
Fabrice Crohas
2
;
Antoine Doucet
1
and
Frédéric Bars
2
Affiliations:
1
University of La Rochelle, L3i, F-17000, La Rochelle, France
;
2
Itesoft, F-30470, Aimargues, France
Keyword(s):
Forgery Detection, Tabular Data, Language Models.
Abstract:
Detecting forgeries in insurance car claims is a complex task that requires detecting fraudulent or overstated claims related to property damage or personal injuries after a car accident. Building predictive models for detecting them raises several issues (e.g. imbalance, concept drift) that cannot only depend on the frequency or timing of the reported incidents. The difficulty of tackling this type of task is further intensified by the static tabular data generally used in this domain, while submitted insurance claims largely consist of textual data. We, thus, propose an explorative guide for detecting forged car insurance claims with language models. Specifically, we investigate two transformer-based frameworks: supervised (where the model is trained to differentiate between forged and non-forged cases) and self-supervised (where the model captures the standard attributes of non-forged claims). For handling static tabular data and unstructured text fields, we inspect various forms
of data row modelling (table serialization techniques), different losses, and two language models (one general and one domain-specific). Our work highlights the challenges and limitations of existing frameworks.
(More)