Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Impact of the Composition of Feature Extraction and Class Sampling in Medicare Fraud Detection

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2022)

Abstract

With healthcare being critical aspect, health insurance has become an important scheme in minimizing medical expenses. Medicare is an example of such a healthcare insurance initiative in the United States. Following this, the healthcare industry has seen a significant increase in fraudulent activities owing to increased insurance, and fraud has become a significant contributor to rising medical care expenses, although its impact can be mitigated using fraud detection techniques. To detect fraud, machine learning techniques are used. The Centers for Medicaid and Medicare Services (CMS) of the United States federal government released “Medicare Part D” insurance claims is utilized in this study to develop fraud detection system. Employing machine learning algorithms on a class-imbalanced and high dimensional medicare dataset is a challenging task. To compact such challenges, the present work aims to perform feature extraction following data sampling, afterward applying various classification algorithms, to get better performance. Feature extraction is a dimensionality reduction approach that converts attributes into linear or non-linear combinations of the actual attributes, generating a smaller and more diversified set of attributes and thus reducing the dimensions. Data sampling is commonly used to address the class imbalance either by expanding the frequency of minority class or reducing the frequency of majority class to obtain approximately equal numbers of occurrences for both classes. The proposed approach is evaluated through standard performance metrics such as F-measure and AUC score. Thus, to detect fraud efficiently, this study applies autoencoder as a feature extraction technique, synthetic minority oversampling technique (SMOTE) as a data sampling technique, and various gradient boosted decision tree-based classifiers as a classification algorithm. The experimental results show the combination of autoencoders followed by SMOTE on the LightGBM (short for, Light Gradient Boosting Machine) classifier achieved best results.

All authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Punn, N.S., Agarwal, S.: Modality specific U-net variants for biomedical image segmentation: a survey. Artif. Intell. Rev. 55, 1–45 (2022)

    Article  Google Scholar 

  2. Nagabhushan, P., Sonbhadra, S.K., Punn, N.S., Agarwal, S.: Towards machine learning to machine wisdom: a potential quest. In: Srirama, S.N., Lin, J.C.-W., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds.) BDA 2021. LNCS, vol. 13147, pp. 261–275. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93620-4_19

    Chapter  Google Scholar 

  3. Sudhanshu, Punn, N.S., Sonbhadra, S.K., Agarwal, S.: Recommending best course of treatment based on similarities of prognostic markers. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. LNCS, vol. 13109, pp. 393–404. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92270-2_34

  4. Punn, N.S., Agarwal, S.: CHS-net: a deep learning approach for hierarchical segmentation of COVID-19 via CT images. Neural Process. Lett. 54, 1–22 (2022)

    Article  Google Scholar 

  5. Kaushik, D., Prasad, B.R., Sonbhadra, S.K., Agarwal, S.: Post-surgical survival forecasting of breast cancer patient: a novel approach. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 37–41. IEEE (2018)

    Google Scholar 

  6. Agarwal, S., Pandey, G.: SVM based context awareness using body area sensor network for pervasive healthcare monitoring. In: Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia, pp. 271–278 (2010)

    Google Scholar 

  7. Medicare CMS (2022). https://www.cms.gov/Medicare/Medicare. Accessed 22 Dec 2021

  8. Ketu, S., Agarwal, S.: Performance enhancement of distributed k-means clustering for big data analytics through in-memory computation. In: 2015 Eighth International Conference on Contemporary Computing (IC3), pp. 318–324. IEEE (2015)

    Google Scholar 

  9. Hancock, J., Khoshgoftaar, T.M.: Leveraging lightGBM for categorical big data. In: 2021 IEEE Seventh International Conference on Big Data Computing Service and Applications (BigDataService), pp. 149–154. IEEE (2021)

    Google Scholar 

  10. Tomar, D., Agarwal, S.: An effective weighted multi-class least squares twin support vector machine for imbalanced data classification. Int. J. Comput. Intell. Syst. 8(4), 761–778 (2015)

    Article  Google Scholar 

  11. Bauder, R., Khoshgoftaar, T.: Medicare fraud detection using random forest with class imbalanced big data. In: 2018 IEEE International Conference on information reuse and integration (IRI), pp. 80–87. IEEE (2018)

    Google Scholar 

  12. Tomar, D., Agarwal, S.: Predictive model for diabetic patients using hybrid twin support vector machine. In: Proceedings of the 5th International Conferences on Advances in Communication Network and Computing (CNC 2014), pp. 1–9 (2014)

    Google Scholar 

  13. Salekshahrezaee, Z., Leevy, J.L., Khoshgoftaar, T.M.: Feature extraction for class imbalance using a convolutional autoencoder and data sampling. In: 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), pp. 217–223. IEEE (2021)

    Google Scholar 

  14. Bouzgarne, I., Mohamed, Y., Bouattane, O., Mohamed, Q.: Composition of feature selection methods and oversampling techniques for banking fraud detection with artificial intelligence. Int. J. Eng. Trends Technol. 69, 216–226 (2021). https://doi.org/10.14445/22315381/IJETT-V69I11P228

    Article  Google Scholar 

  15. Bauder, R.A., Khoshgoftaar, T.M.: The detection of medicare fraud using machine learning methods with excluded provider labels. In: The Thirty-First International Flairs Conference (2018)

    Google Scholar 

  16. Liu, Q., Vasarhelyi, M.: Healthcare fraud detection: a survey and a clustering model incorporating geo-location information. In: 29th World Continuous Auditing and Reporting Symposium (29WCARS), Brisbane, Australia (2013)

    Google Scholar 

  17. Herland, M., Khoshgoftaar, T.M., Bauder, R.A.: Big data fraud detection using multiple medicare data sources. J. Big Data 5(1), 1–21 (2018)

    Article  Google Scholar 

  18. Johnson, J.M., Khoshgoftaar, T.M.: Medicare fraud detection using neural networks. J. Big Data 6(1), 1–35 (2019). https://doi.org/10.1186/s40537-019-0225-0

    Article  Google Scholar 

  19. Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 935–942 (2007)

    Google Scholar 

  20. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  21. Chen, Z., Yeo, C.K., Francis, B.S.L., Lau, C.T.: A MSPCA based intrusion detection algorithm tor detection of DDoS attack. In: 2015 IEEE/CIC International Conference on Communications in China, pp. 1–5. IEEE (2015)

    Google Scholar 

  22. Chen, Z., Yeo, C.K., Francis, B.S.L., Lau, C.T.: Combining mic feature selection and feature-based MSPCA for network traffic anomaly detection. In: 2016 Third International Conference on Digital Information Processing, Data Mining, and Wireless Communications, pp. 176–181. IEEE (2016)

    Google Scholar 

  23. Chen, Z., Yeo, C.K., Lee, B.S., Lau, C.T.: Detection of network anomalies using improved-MSPCA with sketches. Comput. Secur. 65, 314–328 (2017)

    Article  Google Scholar 

  24. Chen, Z., Yeo, C.K., Lee, B.S., Lau, C.T.: A novel anomaly detection system using feature-based MSPCA with sketch. In: 2017 26th Wireless and Optical Communication Conference (WOCC), pp. 1–6. IEEE (2017)

    Google Scholar 

  25. Hancock, J.T., Khoshgoftaar, T.M.: Gradient boosted decision tree algorithms for medicare fraud detection. SN Comput. Sci. 2(4), 1–12 (2021)

    Article  Google Scholar 

  26. Wu, P., Zhao, H.: Some analysis and research of the AdaBoost algorithm. In: Chen, R. (ed.) ICICIS 2011. CCIS, vol. 134, pp. 1–5. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18129-0_1

    Chapter  Google Scholar 

  27. Hancock, J., Khoshgoftaar, T.M.: Medicare fraud detection using CatBoost. In: 2020 IEEE 21st international conference on information reuse and integration for data science (IRI), pp. 97–103. IEEE (2020)

    Google Scholar 

  28. Dorogush, A.V., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support. arXiv:1810.11363 (2018)

  29. Hancock, J.T., Khoshgoftaar, T.M.: CatBoost for big data: an interdisciplinary review. J. Big Data 7(1), 1–45 (2020)

    Article  Google Scholar 

  30. Shamitha, S., Ilango, V.: A time-efficient model for detecting fraudulent health insurance claims using artificial neural networks. In: 2020 International Conference on System, Computation, Automation and Networking, pp. 1–6. IEEE (2020)

    Google Scholar 

  31. Medicare part d prescribers - by provider and drug (2018). https://data.cms.gov/provider-summary-by-type-of-service/medicare-part-d-prescribers/medicare-part-d-prescribers-by-provider-and-drug/data/2018. Accessed 25 Nov 2021

  32. Leie downloadable databases (2022). https://oig.hhs.gov/exclusions/exclusions_list.asp. Accessed 25 Feb 2022

  33. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)

    Google Scholar 

  34. Chen, Z., Yeo, C.K., Lee, B.S., Lau, C.T.: Autoencoder-based network anomaly detection. In: 2018 Wireless Telecommunications Symposium (WTS), pp. 1–5. IEEE (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Narinder Singh Punn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumari, A., Punn, N.S., Sonbhadra, S.K., Agarwal, S. (2023). Impact of the Composition of Feature Extraction and Class Sampling in Medicare Fraud Detection. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30111-7_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30110-0

  • Online ISBN: 978-3-031-30111-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics