Abstract
With healthcare being critical aspect, health insurance has become an important scheme in minimizing medical expenses. Medicare is an example of such a healthcare insurance initiative in the United States. Following this, the healthcare industry has seen a significant increase in fraudulent activities owing to increased insurance, and fraud has become a significant contributor to rising medical care expenses, although its impact can be mitigated using fraud detection techniques. To detect fraud, machine learning techniques are used. The Centers for Medicaid and Medicare Services (CMS) of the United States federal government released “Medicare Part D” insurance claims is utilized in this study to develop fraud detection system. Employing machine learning algorithms on a class-imbalanced and high dimensional medicare dataset is a challenging task. To compact such challenges, the present work aims to perform feature extraction following data sampling, afterward applying various classification algorithms, to get better performance. Feature extraction is a dimensionality reduction approach that converts attributes into linear or non-linear combinations of the actual attributes, generating a smaller and more diversified set of attributes and thus reducing the dimensions. Data sampling is commonly used to address the class imbalance either by expanding the frequency of minority class or reducing the frequency of majority class to obtain approximately equal numbers of occurrences for both classes. The proposed approach is evaluated through standard performance metrics such as F-measure and AUC score. Thus, to detect fraud efficiently, this study applies autoencoder as a feature extraction technique, synthetic minority oversampling technique (SMOTE) as a data sampling technique, and various gradient boosted decision tree-based classifiers as a classification algorithm. The experimental results show the combination of autoencoders followed by SMOTE on the LightGBM (short for, Light Gradient Boosting Machine) classifier achieved best results.
All authors contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Punn, N.S., Agarwal, S.: Modality specific U-net variants for biomedical image segmentation: a survey. Artif. Intell. Rev. 55, 1–45 (2022)
Nagabhushan, P., Sonbhadra, S.K., Punn, N.S., Agarwal, S.: Towards machine learning to machine wisdom: a potential quest. In: Srirama, S.N., Lin, J.C.-W., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds.) BDA 2021. LNCS, vol. 13147, pp. 261–275. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93620-4_19
Sudhanshu, Punn, N.S., Sonbhadra, S.K., Agarwal, S.: Recommending best course of treatment based on similarities of prognostic markers. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. LNCS, vol. 13109, pp. 393–404. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92270-2_34
Punn, N.S., Agarwal, S.: CHS-net: a deep learning approach for hierarchical segmentation of COVID-19 via CT images. Neural Process. Lett. 54, 1–22 (2022)
Kaushik, D., Prasad, B.R., Sonbhadra, S.K., Agarwal, S.: Post-surgical survival forecasting of breast cancer patient: a novel approach. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 37–41. IEEE (2018)
Agarwal, S., Pandey, G.: SVM based context awareness using body area sensor network for pervasive healthcare monitoring. In: Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia, pp. 271–278 (2010)
Medicare CMS (2022). https://www.cms.gov/Medicare/Medicare. Accessed 22 Dec 2021
Ketu, S., Agarwal, S.: Performance enhancement of distributed k-means clustering for big data analytics through in-memory computation. In: 2015 Eighth International Conference on Contemporary Computing (IC3), pp. 318–324. IEEE (2015)
Hancock, J., Khoshgoftaar, T.M.: Leveraging lightGBM for categorical big data. In: 2021 IEEE Seventh International Conference on Big Data Computing Service and Applications (BigDataService), pp. 149–154. IEEE (2021)
Tomar, D., Agarwal, S.: An effective weighted multi-class least squares twin support vector machine for imbalanced data classification. Int. J. Comput. Intell. Syst. 8(4), 761–778 (2015)
Bauder, R., Khoshgoftaar, T.: Medicare fraud detection using random forest with class imbalanced big data. In: 2018 IEEE International Conference on information reuse and integration (IRI), pp. 80–87. IEEE (2018)
Tomar, D., Agarwal, S.: Predictive model for diabetic patients using hybrid twin support vector machine. In: Proceedings of the 5th International Conferences on Advances in Communication Network and Computing (CNC 2014), pp. 1–9 (2014)
Salekshahrezaee, Z., Leevy, J.L., Khoshgoftaar, T.M.: Feature extraction for class imbalance using a convolutional autoencoder and data sampling. In: 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), pp. 217–223. IEEE (2021)
Bouzgarne, I., Mohamed, Y., Bouattane, O., Mohamed, Q.: Composition of feature selection methods and oversampling techniques for banking fraud detection with artificial intelligence. Int. J. Eng. Trends Technol. 69, 216–226 (2021). https://doi.org/10.14445/22315381/IJETT-V69I11P228
Bauder, R.A., Khoshgoftaar, T.M.: The detection of medicare fraud using machine learning methods with excluded provider labels. In: The Thirty-First International Flairs Conference (2018)
Liu, Q., Vasarhelyi, M.: Healthcare fraud detection: a survey and a clustering model incorporating geo-location information. In: 29th World Continuous Auditing and Reporting Symposium (29WCARS), Brisbane, Australia (2013)
Herland, M., Khoshgoftaar, T.M., Bauder, R.A.: Big data fraud detection using multiple medicare data sources. J. Big Data 5(1), 1–21 (2018)
Johnson, J.M., Khoshgoftaar, T.M.: Medicare fraud detection using neural networks. J. Big Data 6(1), 1–35 (2019). https://doi.org/10.1186/s40537-019-0225-0
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 935–942 (2007)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chen, Z., Yeo, C.K., Francis, B.S.L., Lau, C.T.: A MSPCA based intrusion detection algorithm tor detection of DDoS attack. In: 2015 IEEE/CIC International Conference on Communications in China, pp. 1–5. IEEE (2015)
Chen, Z., Yeo, C.K., Francis, B.S.L., Lau, C.T.: Combining mic feature selection and feature-based MSPCA for network traffic anomaly detection. In: 2016 Third International Conference on Digital Information Processing, Data Mining, and Wireless Communications, pp. 176–181. IEEE (2016)
Chen, Z., Yeo, C.K., Lee, B.S., Lau, C.T.: Detection of network anomalies using improved-MSPCA with sketches. Comput. Secur. 65, 314–328 (2017)
Chen, Z., Yeo, C.K., Lee, B.S., Lau, C.T.: A novel anomaly detection system using feature-based MSPCA with sketch. In: 2017 26th Wireless and Optical Communication Conference (WOCC), pp. 1–6. IEEE (2017)
Hancock, J.T., Khoshgoftaar, T.M.: Gradient boosted decision tree algorithms for medicare fraud detection. SN Comput. Sci. 2(4), 1–12 (2021)
Wu, P., Zhao, H.: Some analysis and research of the AdaBoost algorithm. In: Chen, R. (ed.) ICICIS 2011. CCIS, vol. 134, pp. 1–5. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18129-0_1
Hancock, J., Khoshgoftaar, T.M.: Medicare fraud detection using CatBoost. In: 2020 IEEE 21st international conference on information reuse and integration for data science (IRI), pp. 97–103. IEEE (2020)
Dorogush, A.V., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support. arXiv:1810.11363 (2018)
Hancock, J.T., Khoshgoftaar, T.M.: CatBoost for big data: an interdisciplinary review. J. Big Data 7(1), 1–45 (2020)
Shamitha, S., Ilango, V.: A time-efficient model for detecting fraudulent health insurance claims using artificial neural networks. In: 2020 International Conference on System, Computation, Automation and Networking, pp. 1–6. IEEE (2020)
Medicare part d prescribers - by provider and drug (2018). https://data.cms.gov/provider-summary-by-type-of-service/medicare-part-d-prescribers/medicare-part-d-prescribers-by-provider-and-drug/data/2018. Accessed 25 Nov 2021
Leie downloadable databases (2022). https://oig.hhs.gov/exclusions/exclusions_list.asp. Accessed 25 Feb 2022
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
Chen, Z., Yeo, C.K., Lee, B.S., Lau, C.T.: Autoencoder-based network anomaly detection. In: 2018 Wireless Telecommunications Symposium (WTS), pp. 1–5. IEEE (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kumari, A., Punn, N.S., Sonbhadra, S.K., Agarwal, S. (2023). Impact of the Composition of Feature Extraction and Class Sampling in Medicare Fraud Detection. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_54
Download citation
DOI: https://doi.org/10.1007/978-3-031-30111-7_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30110-0
Online ISBN: 978-3-031-30111-7
eBook Packages: Computer ScienceComputer Science (R0)