Impact of the Composition of Feature Extraction and Class Sampling in Medicare Fraud Detection

Kumari, Akrity; Punn, Narinder Singh; Sonbhadra, Sanjay Kumar; Agarwal, Sonali

doi:10.1007/978-3-031-30111-7_54

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13625))

Included in the following conference series:

International Conference on Neural Information Processing

988 Accesses
1 Citations

Abstract

With healthcare being critical aspect, health insurance has become an important scheme in minimizing medical expenses. Medicare is an example of such a healthcare insurance initiative in the United States. Following this, the healthcare industry has seen a significant increase in fraudulent activities owing to increased insurance, and fraud has become a significant contributor to rising medical care expenses, although its impact can be mitigated using fraud detection techniques. To detect fraud, machine learning techniques are used. The Centers for Medicaid and Medicare Services (CMS) of the United States federal government released “Medicare Part D” insurance claims is utilized in this study to develop fraud detection system. Employing machine learning algorithms on a class-imbalanced and high dimensional medicare dataset is a challenging task. To compact such challenges, the present work aims to perform feature extraction following data sampling, afterward applying various classification algorithms, to get better performance. Feature extraction is a dimensionality reduction approach that converts attributes into linear or non-linear combinations of the actual attributes, generating a smaller and more diversified set of attributes and thus reducing the dimensions. Data sampling is commonly used to address the class imbalance either by expanding the frequency of minority class or reducing the frequency of majority class to obtain approximately equal numbers of occurrences for both classes. The proposed approach is evaluated through standard performance metrics such as F-measure and AUC score. Thus, to detect fraud efficiently, this study applies autoencoder as a feature extraction technique, synthetic minority oversampling technique (SMOTE) as a data sampling technique, and various gradient boosted decision tree-based classifiers as a classification algorithm. The experimental results show the combination of autoencoders followed by SMOTE on the LightGBM (short for, Light Gradient Boosting Machine) classifier achieved best results.

All authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data reduction techniques for highly imbalanced medicare Big Data

Article Open access 03 January 2024

Fraudulent Detection in Healthcare Insurance

Modeling Insurance Fraud Detection Using Imbalanced Data Classification

References

Punn, N.S., Agarwal, S.: Modality specific U-net variants for biomedical image segmentation: a survey. Artif. Intell. Rev. 55, 1–45 (2022)
Article Google Scholar
Nagabhushan, P., Sonbhadra, S.K., Punn, N.S., Agarwal, S.: Towards machine learning to machine wisdom: a potential quest. In: Srirama, S.N., Lin, J.C.-W., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds.) BDA 2021. LNCS, vol. 13147, pp. 261–275. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93620-4_19
Chapter Google Scholar
Sudhanshu, Punn, N.S., Sonbhadra, S.K., Agarwal, S.: Recommending best course of treatment based on similarities of prognostic markers. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. LNCS, vol. 13109, pp. 393–404. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92270-2_34
Punn, N.S., Agarwal, S.: CHS-net: a deep learning approach for hierarchical segmentation of COVID-19 via CT images. Neural Process. Lett. 54, 1–22 (2022)
Article Google Scholar
Kaushik, D., Prasad, B.R., Sonbhadra, S.K., Agarwal, S.: Post-surgical survival forecasting of breast cancer patient: a novel approach. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 37–41. IEEE (2018)
Google Scholar
Agarwal, S., Pandey, G.: SVM based context awareness using body area sensor network for pervasive healthcare monitoring. In: Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia, pp. 271–278 (2010)
Google Scholar
Medicare CMS (2022). https://www.cms.gov/Medicare/Medicare. Accessed 22 Dec 2021
Ketu, S., Agarwal, S.: Performance enhancement of distributed k-means clustering for big data analytics through in-memory computation. In: 2015 Eighth International Conference on Contemporary Computing (IC3), pp. 318–324. IEEE (2015)
Google Scholar
Hancock, J., Khoshgoftaar, T.M.: Leveraging lightGBM for categorical big data. In: 2021 IEEE Seventh International Conference on Big Data Computing Service and Applications (BigDataService), pp. 149–154. IEEE (2021)
Google Scholar
Tomar, D., Agarwal, S.: An effective weighted multi-class least squares twin support vector machine for imbalanced data classification. Int. J. Comput. Intell. Syst. 8(4), 761–778 (2015)
Article Google Scholar
Bauder, R., Khoshgoftaar, T.: Medicare fraud detection using random forest with class imbalanced big data. In: 2018 IEEE International Conference on information reuse and integration (IRI), pp. 80–87. IEEE (2018)
Google Scholar
Tomar, D., Agarwal, S.: Predictive model for diabetic patients using hybrid twin support vector machine. In: Proceedings of the 5th International Conferences on Advances in Communication Network and Computing (CNC 2014), pp. 1–9 (2014)
Google Scholar
Salekshahrezaee, Z., Leevy, J.L., Khoshgoftaar, T.M.: Feature extraction for class imbalance using a convolutional autoencoder and data sampling. In: 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), pp. 217–223. IEEE (2021)
Google Scholar
Bouzgarne, I., Mohamed, Y., Bouattane, O., Mohamed, Q.: Composition of feature selection methods and oversampling techniques for banking fraud detection with artificial intelligence. Int. J. Eng. Trends Technol. 69, 216–226 (2021). https://doi.org/10.14445/22315381/IJETT-V69I11P228
Article Google Scholar
Bauder, R.A., Khoshgoftaar, T.M.: The detection of medicare fraud using machine learning methods with excluded provider labels. In: The Thirty-First International Flairs Conference (2018)
Google Scholar
Liu, Q., Vasarhelyi, M.: Healthcare fraud detection: a survey and a clustering model incorporating geo-location information. In: 29th World Continuous Auditing and Reporting Symposium (29WCARS), Brisbane, Australia (2013)
Google Scholar
Herland, M., Khoshgoftaar, T.M., Bauder, R.A.: Big data fraud detection using multiple medicare data sources. J. Big Data 5(1), 1–21 (2018)
Article Google Scholar
Johnson, J.M., Khoshgoftaar, T.M.: Medicare fraud detection using neural networks. J. Big Data 6(1), 1–35 (2019). https://doi.org/10.1186/s40537-019-0225-0
Article Google Scholar
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 935–942 (2007)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Chen, Z., Yeo, C.K., Francis, B.S.L., Lau, C.T.: A MSPCA based intrusion detection algorithm tor detection of DDoS attack. In: 2015 IEEE/CIC International Conference on Communications in China, pp. 1–5. IEEE (2015)
Google Scholar
Chen, Z., Yeo, C.K., Francis, B.S.L., Lau, C.T.: Combining mic feature selection and feature-based MSPCA for network traffic anomaly detection. In: 2016 Third International Conference on Digital Information Processing, Data Mining, and Wireless Communications, pp. 176–181. IEEE (2016)
Google Scholar
Chen, Z., Yeo, C.K., Lee, B.S., Lau, C.T.: Detection of network anomalies using improved-MSPCA with sketches. Comput. Secur. 65, 314–328 (2017)
Article Google Scholar
Chen, Z., Yeo, C.K., Lee, B.S., Lau, C.T.: A novel anomaly detection system using feature-based MSPCA with sketch. In: 2017 26th Wireless and Optical Communication Conference (WOCC), pp. 1–6. IEEE (2017)
Google Scholar
Hancock, J.T., Khoshgoftaar, T.M.: Gradient boosted decision tree algorithms for medicare fraud detection. SN Comput. Sci. 2(4), 1–12 (2021)
Article Google Scholar
Wu, P., Zhao, H.: Some analysis and research of the AdaBoost algorithm. In: Chen, R. (ed.) ICICIS 2011. CCIS, vol. 134, pp. 1–5. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18129-0_1
Chapter Google Scholar
Hancock, J., Khoshgoftaar, T.M.: Medicare fraud detection using CatBoost. In: 2020 IEEE 21st international conference on information reuse and integration for data science (IRI), pp. 97–103. IEEE (2020)
Google Scholar
Dorogush, A.V., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support. arXiv:1810.11363 (2018)
Hancock, J.T., Khoshgoftaar, T.M.: CatBoost for big data: an interdisciplinary review. J. Big Data 7(1), 1–45 (2020)
Article Google Scholar
Shamitha, S., Ilango, V.: A time-efficient model for detecting fraudulent health insurance claims using artificial neural networks. In: 2020 International Conference on System, Computation, Automation and Networking, pp. 1–6. IEEE (2020)
Google Scholar
Medicare part d prescribers - by provider and drug (2018). https://data.cms.gov/provider-summary-by-type-of-service/medicare-part-d-prescribers/medicare-part-d-prescribers-by-provider-and-drug/data/2018. Accessed 25 Nov 2021
Leie downloadable databases (2022). https://oig.hhs.gov/exclusions/exclusions_list.asp. Accessed 25 Feb 2022
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
Google Scholar
Chen, Z., Yeo, C.K., Lee, B.S., Lau, C.T.: Autoencoder-based network anomaly detection. In: 2018 Wireless Telecommunications Symposium (WTS), pp. 1–5. IEEE (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Information Technology Allahabad, Jhalwa, Prayagraj, Uttar Pradesh, India
Akrity Kumari, Narinder Singh Punn & Sonali Agarwal
Department of CSE, ITER, Siksha ‘O’ Anusandhan, Bhubaneswar, Odisha, India
Sanjay Kumar Sonbhadra

Authors

Akrity Kumari
View author publications
You can also search for this author in PubMed Google Scholar
Narinder Singh Punn
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Kumar Sonbhadra
View author publications
You can also search for this author in PubMed Google Scholar
Sonali Agarwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Narinder Singh Punn .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumari, A., Punn, N.S., Sonbhadra, S.K., Agarwal, S. (2023). Impact of the Composition of Feature Extraction and Class Sampling in Medicare Fraud Detection. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_54

Download citation

DOI: https://doi.org/10.1007/978-3-031-30111-7_54
Published: 13 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30110-0
Online ISBN: 978-3-031-30111-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Impact of the Composition of Feature Extraction and Class Sampling in Medicare Fraud Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Data reduction techniques for highly imbalanced medicare Big Data

Fraudulent Detection in Healthcare Insurance

Modeling Insurance Fraud Detection Using Imbalanced Data Classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Impact of the Composition of Feature Extraction and Class Sampling in Medicare Fraud Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Data reduction techniques for highly imbalanced medicare Big Data

Fraudulent Detection in Healthcare Insurance

Modeling Insurance Fraud Detection Using Imbalanced Data Classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation